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PREFACE 


Tuts book (in two volumes) is primarily meant for Two Year Degree 
Course for B. Com. (Hons.) students under new syllabus of Calcutta 
University. There are many students in B. Com. (Hons.) Course who 
have passed previous Plus Zwo Course without taking Mathematics as 
one of the subjects. This book is prepared keeping those students in 
mind. 


Different chapters in the prescribed syllabus have been expounded 
with great care with the help of explanatory notes followed by suitable 
examples. For the guidance of all types of students quite a large 
number of hard examples have been worked out in all the chapters. 
Advanced exercises have been provided for the benefit of ambitious 
students. While writing the book, we have always kept in mind the 
nature and types of problems usually set in different examinations. 


/ Apart from University examinations, few essential questions set in 
C.A, and 1.0.W.A. of India Examinations have been given in the beok 
for the interest of students. The authors thank those institutions 
mentioned for the kind permission given to make use of such questions. 


In preparing this book we had to consult many books written by 
Indian and foreign authors. Professors D. Roy of Gobardanga Hindu 
College, G. 8. Mukherjee and K. Debnath of Umeschandra College, 
N. L. Lahiri and R. K. Bhattacharjee of City College of Commerce 
encouraged and helped us in many respects. We express our indebted- 
ness and heartful gratitude to all of them. 


Sri Subhash Chandra Ghosh, an ex-student, took an active part in 
laying out the Charts, Tables and also made through proof-reading 
while printing the book. We are pleased to record his untiring effort. 


The authors are also thankful to Sri Mahendra Nath Paul of The 
New Book Stall and the authorities of K. P. Basu Printing Works. 


[vi] 


In spite of best efforts, some mistakes might have been crept in. 
We shall be highly obliged if any reader kindly brings such mistakes 
to our notice. Suggestions, if any, for the improvement of the book 
will be gratefully acknowledged. 


CaLcurra } AUTHORS 
Nov., 1981 f 
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BUSINESS STATISTICS 


INTRODUCTION 


The word STATISTIOS was originally, in the earlier age, used 
for collection and arrangements of facts, not necessarilly numerical 
type, about a state or the people belonging to a state. Now a days the 
word STATISTICS is generally used for making census operations, 
collecting information regarding social and economic status of different 
people, of the different part of a country. 


The. theory of probability—the basic principle of porate was 
first discussed by G.Cardano. The theory is, recognised to-day 
as one of the fundamental laws of statistics and statistical conclusions 
are largely based on it. Of course, the studies were extended 
by Gauss, Bayes Euler, Lagrange, Hain, Knapp and Lexis—only few 
names to mention. 


Statistics in India. 

In India Statistics came during the reigns of Ashoka, 
Gupta Dynasty and Mughal rules. We find Kautilya Artha-shastra 
is replete with statistics of land, prices, wages, of population 
ete., collected during Maurya rule. It may be noted Todar Mal, 
the Finance Minister of Akbar compiled statistics of land, 
agriculture, trade ete. for placing the land revenue and tax system 
in a systemetic order. 


In the second half of 18th century East India Company also 
started collecting statistics mainly on agriculture. In the publication 
of ‘Statistical Abstract of British India’ during British Government 
reign, we find a lot of statistical informations. 


Sense. 
The word statistics is now used in two different senses :— 
(i) Statistics as a plural noun, mean a collection of numterical 


facts (statistics of birth or death) or derivation of numerical facts 
i.e. percentages, averages, estimation regarding any population etc. 


(ii) Statistics, as a singular noun, on the other hand, refers, 
to various methods called statistical methods, adopted for collection, 
analysis and interpretation of numerical data. 


+ 
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It should be noted, here, that a single and unconnected figure 
cannot be called as statistics, A single figure is incapable of comparison 
analysis or interpretation. There should be at least two figures. 
Further the figures, in question, should be capable of being placed 
in relation to each other. 


-In whatever sense the word statistics is used, it should be 
always remembered that the subject is mainly concerned with facts 
expressed in a numerical form i.e. with quantitative details and not 
with the qualitative descriptions. 


Relations, 


Statistics and Mathematics. During the 17th century, the 
methods of statistical science were used under the name of Political 
Arithmetic. In 18th century, a relation between statistics and 
mathematics was formed on the basis of the theory of probability 
when Jacob Bernoulli (1654—1705) stated the ‘Law of large numbers’ 
in his great work Ars Conjectandi published eight years after his 
death. DL. A.J. Quetlet (1796—1874) also emphasised the importance 
of ‘Law’ of large numbers’. Daniel Bernoulli (1700—1789) laid a solid 
foundations on the theory of probability, On these foundations 
laid by the mathematicians (mentioned a few only), modern theory of 
statistics was gradually built up. n 


Statistics and Economics. The relationship between these two 
Sciences became intimate rather late, although a reference of relation- 
ship was made by Sir William Petty in his work, Political Arithmetic 
published in 1690. By the 18th century, statistical data relating 
to population, taxes, agriculture, industry, trade etc. used to be 
collected in most civilised countries, but there was no relationship 
between statistical information and economic theory. In 1871, 
W. 8, Jeyons wrote in his Theory of Political Economy that ‘the 
deductive science of economy must be verified and rendered useful 
from the purely inductive science of statistics. Theory must be invested 
with the reality and life of fact, Political economy could gradually be 
developed into an exact science, if commercial statistics were far more 
complete and precise.” He developed the technique of analysis of 
time series. Rightly he has been called the ‘Father of Index Numbers’, 
Besides Jevons, the Historical School (1848—1883) brought Statistics 
and Economics more closer, Infact Roscher, Knies, Hilderbrand and 
Cliff Leslie believed that economic doctrines should not be argued 
in the abstract, but to be inductively proved. This effect was indeed 
great and the science of economies no more remained deductive 
in approach, By the end of 19th century, attitude of economists 
towards the inductive method ‘had become friendly. In 1907, 
Alfred Marshell wrote that disputes as to the methods of study 
in economics had ceased, that qualitative analysis had performed 
the greater part of its work and the progress in the quantitative 
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analysis depended upon the growth of realistic statistics. He explained 
that induction and deduction were both needed for scientific thought, 
as the right and left feet were both needed for walking. 


Since 1890, two factors have brought about the fundamental 
change in the relationship of statistics and economies... The first is 
the development of statistical methods, like probability sampling, 
correlations, periodicity and index number etc., secondly there is an 
enlargement of statistical data in recent years. During the period, 
eminent statisticians like August Meitzen, Karl Pearson, G. U. Yule, 
C. V. Davenport, A. L. Bowley, W. Persons, R, A, Fisher etc. have 
made valuable contributions regarding the development of the science, 


The improvement of statistical methods and enlargement of 
statistical data have thus brought statistics and economics very close 
to each other. 


Limitations of the Science of Statistics. 


1. Statistics studies only quantitative phenomena. One of the 
important limitations of the science is that it deals only with those 
phenomena which can be expressed by quantity,’ And’ phenomena 
which cannot be expressed by figure, like brave, honesty, intelligency 
ete, are of little use. Of course efficiency, intelligence ete. can be 
compared on the basis of marks obtained, but still these are only 
indirect method of approach. 


2. Statistical laws are true only on anaverage. It is known 
that the laws of physics, mathematics, chemistry etc, are exact and 
universally applicable. The laws of statistics are not so.’ Statistics 
deals with such phenomena which are affected by multiplicity of causes 
and it is not possible to study the effects of each of these factors 
individually as is done by experimental methods. Due to this limita- 
tion, the results obtained are not perfectly accurate rather approxi- 
mation. 


8. Statistics deals with aggregates and not with individuals. 
Statistics deals with aggregate, although these aggregates are often 
reduced to single figures for analysis. A series of figures is condensed 
into an average for comparison, but an individual item of the same 
series has no specific recognition. 


4, It is liable to be misused. Any person can misuse statistics 
and draw any type of conclusions he desires. Statistical methods 
can be properly used by only those who have a sound knowledge about 
fo and their use by less expert hands is sure to give inaccurate 
results. 


Those who use statistics must be aware of the limitations, 
According to W. I. King, “statistics is a most useful servant but only 
of great value to those who understand its proper use.” 
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Characteristics. 


(i) Statistics must be quantitatively expressed : Phenomena 
which cannot be expressed numerically like intelligence, honesty, 
brave etc. ; are of little use in statistics. Qualitative expression like 
young, middle-aged or old should be expressed by say 25, 45 or 
65 years, 


(ii) Statistics are always aggregates: Single and unconnected 
figures are not statistics as they cannot be studied in relation to each 
other. A single age of 20 or 25 years is not statistics but series of 
ages. A single birth or sale is not a statistics while a number of 
births or sales are so, since they can be studied in relation to time or 
place, 


(iii) Relation to enquiry: The significance of certain figures 
can be better appreciated when they are compared with others of the 
same type. 


(iv) Relation to each others: Statistics are generally collected to 
facilitate comparison in point of time yee place or condition. If, however, 
the collected data are unfit for “comparison, then much of the 
importance is over. For this purpose, the figures should be of same 
nature, For instance, ages of husbands are to be compared only with 
the corresponding ages of wives and not with the lengths of the trees. 


Division of Works, 
The whole work of a statistician can be broadly divided as 
follows : 
(i) Collection of data 
(ii) Classification and tabulation 
(iii) Analysis 
(iv) Interpretation 


\ 


COLLECTION OF DATA 


A statistician begins the work with the collection of data i.e. 
numerical facts. The data so collected are called raw materials (or 
raw data). It is from these raw materials, a statistician analyses after 
proper classification and tabulation, for the final decision or conclusion, 
Therefore it is undoubtedly important that the raw data collected 
should be clear, accurate and reliable. 


Before the collection of data, every enquiry must have a definite 
object and certain scope, that is to say, what information will be 
collected, for whom it will be collected, how often or at what 
periodicity it will be collected and so on. If the object and the 
scope of enquiry are not clearly determined before hand, difficulties 
may arise at the time of collection, which will be simply a wastage 
of fime and money. 


Statistical Unit. 


The unit of messurement applied to the data in any particular 
problem is the statistical unit. 

Physical units of measurement like quintal, kilogramme, metre, 
hour and year etc. do not need any explanation or definition. But 
in some cases statistician has to give some proper definition 
regarding the unit. For example, the wholesale price of commodity. 
Now what does the form ‘wholesale price’ signify? Does it stand 
for the price at which the producer sells the goods concerned to the 
stockist, or the price at which the stockist sells to a wholesaler ? 
Is it the price at which the market opened at the day of enquiry ? 
Many such problems may arise as stated. It is thus essential that 
a statistician should define the units of data before he starts the work 
of collection. 


Requirements. 


1. Its definition must be unambiguous, simple and complete 
itself. If the unit is not definite, the data collected might be inaccuracy. 
So it is necessary that the units should be properly defined. 


2. It should be stable in character. If there is any fluctuation, 
the data cannot be compared. For example, if one seer is equal to 
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0°92 Kg. at one place, at one place ‘90 Kg. and at other 1 Kg. the data 
collected can never be compared. 

3. The unit should be homogeneous i.e, the unit should imply 
the same characteristic and also should be uniform throughout the 
enquiry. If the data are not homogeneous, comparison cannot be 
made. Now if the data are heterogeneous, they may be broken 
up into small homogeneous classes. For example, if the data 
relating to failures in a certain examination are being collected, 
then the failures can be divided into a» number of classes on 
the basis of subject wise or absentees (total or partial), 


4, The statistical unit should be appropriate to enquiry. 


Types. 
The units are of two types :— 


1. Units of collection. 
2. Units of analysis and interpretation. 


Units of collection are those units in terms of which mensure- 
ments are made (or estimated), Production of cotton textile 
industry in bales, consumption of electricity in kilowatts, 
production of a cereal in a ton etc. are the examples in simple unit. 
And ton-mile, industrial accident, credit sale etc, are tho examples 
in composite unit. Here tone-mile indicate the number of tons multi- 
plied by the number of miles carried. It is combination of 
qualifying word to a simple unit, 


Units of analysis are those units in terms of which data are 
compared. They include ratios, percentages, rates etc. All these 
are useful for comparison. 


Degree of Accuracy. 


Another point that should be decided in advance is the degree 
of accuracy to which the data is to be collected. For instance, it 
should be decided whether prices or wages are to be quoted correct to 
paise or rupee, Similarly the weight to gramme or kilogramme. 
The aims should be, to determine the unit in advance clearly and 
precisely, so as to avoid the wastage of time and labour. 


Types and Methods of Collection of Data. 


Statistical data are usually of two types, (i) Primary 
(ii) Secondary. : 


Data which are collected for the first time, for a specific 


purpose are known as primary data, while those used in an 
investigation, which haye been originally collected by some one 
else, are known as secondary data. 
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For example, data gathered by the Office of the Registrar 

General of Census Operation are primary data, but are secondary 
when used by others. 


On the basis of primary and secondary data the methods of 
collecting statistical data have been divided into Primary 
method and Secondary method. 


Distinguish between Primary and Secondary Data. 


1. Primary data are those data which are collected for the first 
time and thus original in character. Secondary data are those data that 
have already been collected earlier by some other persons. 


9. Primary data are in the form of raw materials to which 
statistical methods are applied for the purpose of analysis. On the 
other hand, secondary data are in the form of finished products as they 
have been already statistically applied. 


3. Primary data are collected directly from the people to which 
enquiry is related. Secondary data are collected from published 
materials. 


4, It observed closely the difference is of one degree only. Data 
are primary to an institutions collecting it, while they are secondary 
for all others. Thus data which are primary in the hands of one, are 
secondary in the hands of other. 


Primary Method. 
The following methods are common in use :— 


(i) Direct Personal Observation. Under this method, the 
investigator collects the data personally. He has to go to the 
spot for conducting enquiry and has to meet the persons concerned. 
Tt is essential that the investigator should be polite, tactful and has a 
sense of observation. 


This method is applicable when the field of enquiry is small and 
there is an intention of greater accuracy. This method however, 
gives satisfactory rasult provided the investigator is fully depen- 
dable. 


(ii) Indirect Oral Investigation. In this method data are 
collected through indirect sources. Persons having some know- 
ledge regarding the enquiry, are cross-examined and the desired 
information is collected. Hyidence of one person should not 
relied, but a number of views should be taken to find out real 
position. This method is usually adopted by enquiry committees 
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or commissions appointed by governments or Semi-governments 
or private institutions. Certain precautions are to be taken here. 
Firstly it should be seen whether the informant knows full facts of 
the problem under investigations. Secondly it should be considered 
that the person questioned is not prejudiced and also not motivated 
to colour the facts. Of course, due allowance should be made for 
optimism and pessimism. 


(iii) Schedules and Questionnaires. A list of questions regard- 
ing the enquiry is prepared and printed. Data are collected in any of 
the following ways, 


(2) By sending the questionnaire to the persons concerned with a 
request to answer the questions and return the questionnaire, 


Success in this method depends entirely on the co-operation of 
the informants. The advantage in this method is that it is less 
costly, as no enumerators are required and investigations can be 
completed within a short time, 


The disadvantages are—many individual do not return the 
forms in time and some of the individuals make mistake in filling 
up the forms, 


(b) By sending the questionnaires through enwmerators for 
helping the informants. 


In this method, enumerators g0 to the informants to help them 
in filling the answers. This method is useful for extensive enquiries. 
It is expensive. Population census is conducted by this method. 
It is essential enumerators should be polite, and have proper training, 
The implications and scope of each question, to be asked to the infor- 
mants, should be explained clearly to the enumertors, They should 
be instructed how to check up apparently wrong replies. They should 
have intelligence and capacity to cross-examine the informants for 
finding out the true result, 


(iv) Local Reports. This method does not imply a formal 
collection of data. Only local agents or correspondents are requested 
to supply the estimates required. This method gives only approximate 
results, of course at a low cost. 


Questionnaires. 
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For satisfactory investigation a questionnaire should possess the 
following points :— : 


(i) The schedule of questions must not be lengthy. Many 
questions may arise during preparations of questionnaire. Tf all of 
them are included, the result is that the persons who are interviewed 
may feel bored and reluctant to answer all the questions. So only 
the important questions are to be included. 

(ii) It should be simple and clear. The questions should be 
‘understandable even by the most uneducated people so that infor- 
mants do not find any difficulty in furnishing the answers. The 
factors of simplicity and clarity also imply that the questions should 
be few so that the informant may not be confused. If possible, the 
questions should be so set up that require brief answers viz ‘yes’, ‘no’ 
or a ‘number’ ete, 

(iii) Hach question should be brief and must aim to some par- 
ticular information necessary for the investigation of the problem. 
Gengthy questions may be split up into smaller parts, which will be 
easily grasped by the informants. 

(iv) Questions on personal matters like income or property should 
‘be avoided as far as possible, as people are generally reluctant to dis- 
close the truth. In such cases, the informations may be collected on 
guess-work, 

(vy) ‘The questions should be arranged in logical sequence. 
‘The first part may contain questions like name, age, address etc, and 
serious or personal questions should be set at the end of the questio- 
nnaire so that the informant may answer them when he feels easy 
with the interviewer. 

(vi) The units of informations should be clearly shown in the 


‘schedule. For example, 


State your age. years...... .-MONEHS... 004+ 
what is your weight ? kg... 
Example. 


The following form was used in census of population of 
India 1961, for haying a census of Scientific and Technical Personnel. 


OHNSUS OF INDIA 1961 ; SOIENTIFIO & TECHNIOAL PERSONNEL 


Only a person with a recognised Degree or Diploma in Science, 
Engineering, Technology or Medicine should fill in this card 


READ OAREFULLY BEFORE FILLING 
IN TICK ( ) WITHIN BRACKETS PRO- CENSUS LOCATION CODE 
VIDED WHERE APPLICABLE 
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[ Form Contd. ] 


4, PERMANENT ADDRESS..0.....ccsesseersesseeer® 
5. (a) Male bs) 8, ACADEMIC QUESTIONS (ANSWER FULLY) 
(b) Female 3 k Divist Year of 
Gala) eeveser Degree/Diploma | Subject taken ivision passing 
Married ( ) 
(b) Married ( 
7. On Feb, Ast. 1961 
were you 
(a) Employed ? () | 
If so, monthly total 
income Tf employed fill in Qs, 9-12 11. Where employed ? 
Se a 9. Nature of employment ( ) a. Public Sector ( 
(b) Fat ) a, Teaching in School ) bs Private Sector ( 
student ? b 1» Oollege ( ) co. Self Employment ( 
(c) Unemployed if so, o, Technical in Industry ( ) 12, How Employment ? 
howlong? ( ) d.  ,, Outside Industry( ) a. Permanent ( 
seageneeane YTieeeseceee e Moms eohntea) { } db ene, t 
0. Any Researc! c, On contract 
(@) Retired ? Assignment () @. Research Scholar ( 
Yes( ) / No () @ Otherwise ( 
Date Signature 


Secondary Method, 
The main sources from which secondary data are collected 
are given below— 


(i) Official publications by the central and state governments, 
District Boards, : 


(ii) Reports of Committees, Commissions, 

(iii) Publications by Research Institutions, Universities, 

(iv) Economic and Commercial Journals, 

(vy) Publications of Trade Associations, Chambers of Commerce 

ete., : ‘i 
(vi) Market reports, individual research. works of Statisticians, 


Secondary dats are also available from unpublished records of 
government offices, chambers of commerce, labour bureaus ete, 


Editing and Scrutiny. 

Secondary data should be used only after careful enquiry and 
with due criticism. It is advisable not to take them at their face- 
value. Scrutiny is essential because the data might be inaccurate, 
unsuitable and inadequate, According to Bowley, “Tt is never safe- 


ed 
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to take published statistics at their face value without knowing their 
«meanings and limitations......... a 


‘Secondary data may, however, be used provided they possess the 
attributes (¢.e. qualities) shown below— : 


1. Data should be reliable. The reliability of data depends on 
the following quiries— 


(a) The sources of original collector's informations. 
(b) Original compiler’s reference. : 


(c) Method of collection including instructions given to the 
-enumerators. 


(d). Period of collections of data. 
{e) Degree of accuracy desired and achieved by the compiler. 


2. Data should be suitable. For the propose of investigation, 
even the reliable data should be avoided if they are found to be not 
suitable for the purpose concerned. Data suitable for one enquiry 
may be unsuitable for the other. 


8. They should be adequate. Tyven the reliable and also suitable 
data may become inadequate sometimes for enquiry. The original 
data may refer to a certain market price during disturbed period ; 
for a normal period the above reference will be inadequate. 


Representative Data. 


Statistical investigation may be complete or partial. Complete 
inyestigation, commonly known as Census, is the enquiry of each 
individual item of the universe or population. Partial investigations, 
commonly known as Sample Survey, is the enquiry only to some 
portions of the universe. The portion selected is called Sample. 


A sample survey is much less expensive, unlike census. In 
statistics the word universe (or population) is used:to denote a totality 
of items, covering the whole range of enquiry. 


A population is finite or infinite according as it contains finite or 
infinite number of members. For example, the population of class—one 
cricket players in India is finite while the temperature of Delhi in 
any day is infinite, for though the temperature varies between two 
finite limits, it takes up an infinity of values between these two limits. 
Therefore sample, as stated is just a portions of population. Selec- 
tion of sample is an important point. A sample survey would show 
dependable result only if the sample is a true representative of 
the universe. The methods of selecting a sample are as follows— 


é (i) Deliberate or Purposive Selection. In this method the 
investigator deliberately selects suchitems which he feels are the true 
representatives of the universe. The main drawback of this method 
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js that personal element has a great chance of entering into the © 
gelection of Sample. 

Two independent purposive sample may give widely different 
estimates of the same population, 


(ii) Random Sampling (or chance selection). It is the method 
of drawing a sample from an universe, such that eyery member of the — 
universe has an equal chance of being including in the sample. It is 
in fact a lottery method of selection. 

Example. To select a random sample of 4 boys from 99 boys, other than 
applying lottery method, we can select the sample of 4 boys by the following way. 
Let the roll nos, of the boys are 1, 2, 3,-+--*5 98, 99. 

Tables of random numbers from digits 1, 2, 8,--9, O arranged in rows and columns 
in purely haphazard manner, are shown below— 


Let us now take any row and column, make 4 two-digit figures successively as 
they occur, Thus if we start from second row and first column and moye hori- 
zontally, we will find numbers, 67, 21, 52, 01. So the boys having the above roll. 
numbers will be the required random samples. If a particular number occurs twice 
or more, then only one number is to be retained. 

When the population is homogeneous in respect of a particular 
characteristic, random samples yeild better results in respect to other 


type of sample. 


(iii) Stratified Sampling. Under this method, the population 
is purposively subdivided into several parts (known as Strata), then 
semple from each stratum is choosen at random. It may be noted, 
subdivision of population is purposive, while choosing of sample is 
purely random. 

So this is a mined sampling of purposive and random sampling- 
Stratified sampling'is used when the population is heterogeneous. 


(iy) Systematic Sampling. In this method every r-th member 
of the population arranged serially is drawn, of course first member 
is to be selected at random, from first r-members of the populations. 


Example. Suppose we are to select a systematic sample of 5 boys out of 150 
boys, numbered serially. Here the number of populations 30 times AmeAt Bize 
Aor number bobo , £059) wy 1, gushes is ee Now every subsequent 

pumber 4. are incl ie sample, 
of 5 boys will be of mambers Oi, 51, 81, 111, 141. De ae 


Law of Statistical Regularity. 


_ According to the rules of the theory of probability, if from the 
universe a moderately large Sizedgsample is chosen at random, it is 
likely that on an average the sample chosen will have the same chara 
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cteristics as the universe. In statistics, this law is known as Law of 
Statistical Regularity. The theory of probability tells us of the mathe- 
matical expectation of happening or failure of an event, and on this 
basis the law of statistical regularity tells us that random selection 
from the universe is very likely to give a representative sample. 

It may be noted that any number of samples will not yield 
exactly the same results as a study of universe would. The same 
probability of error diminishes with the increase in number of items 
taken in the sample. That is, the larger the sample, the more reliable 
are the results. 


Law of Inertia of Large Numbers. 

It is a corollary of the aboye law. We know the larger 
the sample greater would be the accuracy, for in large numbers 
the chances of compensatory are more. The production of rice 
in Burdwan district might vary at large, year to year, but 
in West Bengal State the narration of the same productions would 
be less. In the same way productions of the same commodity 
in India would show still less variations. This phenomenon is genera- 
lised as Law of Inertia of large numbers, which implies the large 
numbers are relatively more stable than small ones. 

But it does not mean that the property of inertia does not allow 
any change with passage of time. It signifies that large numbers are 
more constant and stable than small ones. Above all there is no 
striking change in large numbers, 


Statistical Errors, 

The word error is used in a special sense in statistics. It is the 
difference between the true value and the estimated value of a quan- 
tity. It shows by how much an estimate of a measurement falls 
short of or exceeds the true measurement. It does not mean the 
same thing as mistake. In statistics mistake means a wrong caleula- 
tion or wrong method used in collection or analysis, 


Sources of Errors. ‘ 

(1). Errors of origin—errors resulting from inappropiate or 
faulty definition of units. 

(2) Errors of inadequacy—errors due to inadequate size of sample 
or incomplete information. 

(3) Errors of manipulation—errors due to manipulation in 
counting, measuring, weighing or approximation. 


Types of Errors. , 
There are two types of errors. (a) Absolute error and (b). Relative 


error... 
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Absolute error is the difference between the true value and the 
estimate of a quantity, while relative error is the ratio of absolute error 
to the estimate. — 


Tf again the relative error is expressed as percentage, then it will 
be a percentage error. For example, if true value (price) of a quantity 
is Rs. 101, estimate value is Rs. 100, then 


Absolute error=101-100=Re. 1, relative error=zs0= 01 and 
percentage error ='01 x 100=1%, 


Illustration. (A Specimen of a Blank Form). 


The form recently used by the Government of West Bengal for 
Census of Employees of State Government, Local Bodies ete, is shown 
below : 


Form No.§-5-7 State Govt. Office/ 
Non-Govt. Office 


GOVERNMENT OF WEST BENGAL 
BUREAU OF APPLIED ECONOMICS & STATISTICS 
Census of Employees of State Government, Local Bodies, State 
Public Sector Undertakings and Autonomous Bodies 
AS OD... . ec ceseeeeseeeeeeets $s 


INDIVIDUAL SLIP 
AR id RE 
Dame of Hmployed....s..scsccecceereccensareessreeeteaseseuseeestageeaeeaaneneenaees 
Name of Office.......... tose 
Address (in full)........eccccecsereeeeees 
Department (for Govt. Offices only)... 
Branch/Directorate (for Govt. Offices only). 
District 
Town/Village.. 
PEST SNS At RL RSI SINS VER en Se ee 
1.) * Sex! (code).....1.-5ee..08 1 2. Date of birth.......... eae 
8. Date of entry in Service...... 4, *Caste (code)....... 
Ge DGBIBMAGOD ey s.serecers stirs corevarsoes rakimanniwuyeesten ate 
6 
7. 


. Name of service......-... 
Scale of! Payssivewlcedyecvcrens 
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( Form Contd. ) 
9. Emolument (in round rupees for March, 197 ): 


fs) Basis Pay c-cne-sempasevas (b). Dearness Pay..........cssessecseeeees 
(c) Ad-hoc Pay......cisseeves (@) Special Pay: .n.cc...cc-cassenccssvans 
(e) Personal Pay....s..00000 (f) Dearness Allowance..............6+5 
(g) Interim Dearness (b) Medical Allowance......c...0000000 
Allowance.........cse0ees 
(i) House Rent (j) Compensatory Allowance........... 
Allowance. ......:...0065 
(k) Other emoluments: (i) .... 
(specify) (ii) . 
(iii)... pans 
(ivy) Additional Dearness Allowance...... 
(). Total gross emolument....:........-sccccssssssscecsvesescovsessene eee 


(m) Total deduction from emolument (P.F., Recovery of Advance, 


(n) Net emolument after deduction. 


10, Place of posting :......... va (a) *District ae ink Oo 
; (b) *Rural/Urban (code) oO 
11. Date of presont posting........secsecreccossccsecenesentersncenes Rraveaiioe sists 
12. Place of residence :......... (a) Type: Own House/Government 
Quarter/Rented House/Rent-free 
Quarter. 
(b) Address :. 


18. “Educational qualification (code)D..... 
Total number of members in the family.... 


Signature of Hmployee..........--seeeseeeeveerees Date... s.pcemasncer cay 
Signature of Hnumerator......... seers Date....... AR MEA 
CODE 
Serial No. 1—Sex : (i) Male — 1, (ii) Female — 
Serial No, 4—Caste : (i) Scheduled Cast pn 


(ii) Scheduled Tribe —_ 
(iii) Others as 


ee 
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Serial No, 8—Status : (i) Permanent = 
(ii) Permanent Status - 

(iii) Quasi Permanent _ 

(iv) Temporary _- 

(v) Part-time _- 

(vi) Piece-rate - 

(vii) Contingency Menial — 

(viii) Work-charged - 

(ix) Contract —- 

Serial No. 10 (a)—District : (1) Calcutta _- 
(2) Burdwan - 

(8) Birbhum _ 

(4) Bankura - 

(5) Midnapore - 

(6) Howrah —) 

(7) Hooghly 7 

(8) 24-Parganas : - 

(9) Nadia = 

(10) Murshidabad - 

(11) West Dinajpur - 

(12) Malda — 

(18) Jalpaiguri —- 

(14) Darjeeling — 

(15) Gooch Behar = 

(16) Purulia - 
Serial No. 10 (b)}—Rural/Urban : (i) Rural = 
‘ (ii) Urban T 


tom SaFSRBESSBESAARSSE oaucanwve 


Serial No, 183—Educational 


qualifications : (1) Below School Final Standard = 
(2) School Final or equivalent — 
(8) Higher Secondary or equivalent — 
(4) Intermediate/Twelve Class pass 
: or equivalent ere 
(5) Technical Diploma - 
(6) Graduate (Arts, Science, 
Commerce) _ 
(7) Graduate (Engineering) =_ 
(8) Graduate (Medical—M.B,B,S.) — 
(9) Post-Graduate (Arts, Science, 
Commerce) - 
(10) Post-Graduate (Engineering) = 
(11) Post-Graduate (Medical) — 


oS 838 && 888 
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EXERCISE 1 


1. Distinguish between Primary data and Secondary data. 
State various methods of collecting primary data and comment on 
their relative advantages. [1I. 0. W. A. Jan. 1972 ] 


2. Define Secondary data, State their chief sources and point 
out the dangers involved in their use and what precautions are nece- 
ssary before using them. [C. A. Nov. 1967 ] 


8. Define statistical unit, mention the usual kinds of units 
employed in statistical work. What are the essential points to be 
observed in the choice of a good unit ? 

Giving appropiate reasons, state what units can be used for the 
following cases— 

(i) Production of cotton textile industry, 


(ii) Labour employed in the industry, 
(iii) Consumption of electricity. [C. A. May 1967 ] 


4, What are statistical units ? How would you define them ? 
Describe the various types of statistical units and explain. 


5. Distinguish between census and sample method of investiga- 
tion. What are their relative merits and defects, 


6. Distinguish between Random and Stratified Sample. 


7. What are the essentials of a good questionnaire? A certain 
state has just passed an enactment making attendance at school com- 
pulsory for all children between ages 5-15. You are asked to collect 
all statistics that might be necessary for the purpose of enforcing the 
Act, 

State how would you proceed with the work and what statistics 
you would collect. Draw up a suitable questionnaire blank form to 
collect necessary information. 


8. Itis required to collect information on the economic conditions 
of textile mill workers in Bombay. Suggest a suitable method for collec- 
tion of Primary data. Draft a suitable questionnaire of about ten 
questions for collecting this information, Also suggest how you will 


proceed to carry out statistical analysis of the information collected, 
[ I. 0. W. A. June 1976 } 
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Introduction. 

The data collected or compiled by the methods discussed in the 
previous chapter are usually voluminous, crude in form and are known 
as raw data. They are not directly fit for any statistical purpose. 
For the purpose of any analysis and interpretation, the data require | 
proper arrangements and modifications. 


Classification. 

It is the process of arranging data into different classes or 
groups according to resemblances and similarities, An ideal 
classification should be unambiguous, stable and flexible. 


Object. 

The objects of classification are many. It clearly shows 
the points of similarity, dissimilarity. It prepares the ground for 
comparisons and analysis by orderly arrangements of data. 


Types of Classification. 


There are two types of classification depending upon the nature 
of data, ’ 

@. classification according to attribute—if the data is of a 
descriptive nature having several qualifications i.é., males, females, 
literate, illiterate etc, 

(ii) classification according to class-interyals—if the data are 


expressed in numerical quantities, i.c., ages of persons vary and so do 
there heights and weights. 


Classification according to Attributes. 


(i) Simple classification is that when only one attribute is 
present 7.¢., classification of persons according to sex—males or females. 
(ii) Manifold classification is that when more than one attribute 
are persent simultaneously i.¢., classification of persons regarding 
deafness sex-wise. Now we find that there are two attributes— 
deafness and sex. A person may be either deaf or not deaf, further 
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a person may be a male or female, The data, thus, are to be divided 
into four classes (a) males who are deaf (b) males who are not 
deaf (c) females who are deaf (a) females who are not deaf. The 
study can be further continued, if we find another attribute say 
religion. 


Classification according to Class-intervals. 


This type arises when direct measurements of data is possible. 
Data relating to height, weight, production etc. come under this cate- 
gory. For instance, persons having weight say 100—110 lbs. can form 
one group, 110—120 lbs, another group and so on. In this way data 
are divided into different classes; each of which is known as class- 
interval. Number of items which fall in any class-interval is known 
class-frequency. In the class-intervals mentioned above, the first- 
figures in each of them are the lower limits, while the second. figures 
are the upper limits. The difference between the limits of a class- 
interval is known as magnitude of the class-interval. If for each 
class-intervals the frequencies given are aggregates of the preceeding 
frequencies, they are known as cumulative frequencies. The frequen- 
cies may be cumulated either from top or from below. 


In general, the class-intervals should be of equal magnitude. If- 
the size of the class-interval is unequal it may give a misleading 
impression, and in such cases, comparison of one class with the other 
may not be possible. 


Methods of forming Class-intervals. 


The class-intervals i.e. 100—110, 110—120, 120—180 etc. are 
Overlapping. Difficulty arises when placing an item, say, 110 in the 
above class-interval, Whether 110 lbs. should be placed in the class- 
intervals 100—110 or 110—120. Now in this method, known as 
Haclusive method, an item which is identical to the upper limit of a 
class-intervalis excluded from that class-interval, and is included in 
the next clags-interval. So the item 110 lbs. will belong to the 
class interval 110—120. For all practical uses, 100—110 means 100 
and less than 110, again 110—120 means 110 and less then 120, and 
so on. 

Again the class-intervals may be formed as 100—109, 110—119, 
120—129 ete. In this method, known as Inclusive method, also 
difficulty arises when there is an item lying between the upper limit of 
a class and lower limit of the next class. The above class-intervals 
may also be arranged as 100—109"5, 110—119°5 and s0 on. 


Now it shows whatever be the upper limit in the first class i.c. 
110, 109, 109°5 or 109°9, it is always less than 110, 


It may be noted magnitude in every case is 10. 


[5339 
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Class-intervals with Cumulative Frequencies. 


Tf the class-frequencies are given as cumulative class-frequencies, 
then the class-intervals also are expressed only by their upper limits 
preceded by the word ‘below’ (or less than) or ‘above’ (or more than) 
according as the frequencies are cumulated from the top or bottom. 
Before treating with such data for any statistical purpose, it is 
necessary to convert it into usual class-intervals with their corresponding 
class-frequencies. From the following example, the idea of converting 
the cumulative frequencies to usual frequencies will be clear. 


(a) class-frequencies cumulated (b) class-frequencies cumulated 


From top From bottom 
Weights (Ib.) Persons Weights (Ib.) Persons 
Below 110 10 Above 100 27 
” 120 16 ” 110 17 
” 130 17 ” 120 12 
” 140 21 ” 130 10 
” 150 Q7 » 140 6 


Now the usual type of class-intervals having class-frequencies 


will be as follows— 
Weights (lb.) Persons 


100—110 10 
110—120 5 
120—130 2 
180—140 4 
140—150 6 


Statistical Series, 


If things or attributes are measured, counted or weighted, and 
they are placed one after another, the result is a statistical serios, 
In brief, a statistical series may be defined as things or attributes 
arranged in some logical or systematic order, 


Types. 

There are three bases of classifications of data: time, space 
and condition and consequently there are three types of statistical 
series: (i) time or historical (ii) spatial (iii) condition. In the first 
if the data collected relates to past or present. Now if the figures of 
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series. In the third, data are recorded on the basis of physical con- 
dition. If heights, weights, or ages of 100 students are recorded, the 
different data, arranged in order, shall constitute a condition series. 


Discrete and Continuous Series. 


Statistical series may be either discrete or continuous. A discrete 
series is formed from items which are exactly measurable, Every 
unit of data is separate, complete and not capable of divisions. For 
instance, the number of students obtaining marks exactly 10, 14, 18, 20, 
can easily be counted. But phenomenon like height or weight cannot 
be measured exactly or with absolute accuracy. So the number of 
students (or individuals) having height exactly 5'2” cannot be counted. 
Exact height may be either side of 52” by a hundredth part of an inch. 
In such cases, we are to count the number of students whose heights 
lie between 5/0” to 5’2”. Such series are known as ‘continuous series. 


Example. 


Discrete Series Continuous Series 


Marks No. of Students |Height (inch)| No. of Students 


10 12 58—60 6 

14 16 60—62 10 

18 15 62—64 13 

20 7 64—66 11 
Tabulation. 


Tabulation is a systematic and scientific presentation of data 
in a suitable form for analysis and interpretation. 


After the data have been collected, they are tabulated 7.¢. put in 
a tabular form of columns and rows. The function of tabulation is 
to arrange the classified data in an orderly manner suitable for analysis 
and interpretation. Tabulation is the last stage in collection and 
compilation of data, and is a kind of stepping-stone to the analysis and 
interpretation. 
A table broadly consists of five parts :— 
i) Number and Title indicating the serial number of the table 
and the subject matter of the table. 
(ii) Stub i.e. the column indicating the headings of rows. 
(iii) Caption i.e. the headings of the column (other than stub). ° 
(iv) Body i.e. figures to be entered in the table. 
(vy) Foot Note is source from which the data have been obtained. 
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Thus a table should be arranged as follows— 


Table No. 
Title 
ee 
Caption Total 
Stub Body 
Total 
ee 


Types of Tabulation. 


Mainly there are two types of table—Simple and Complex. 
Simple tabulation reveals information regarding one or more groups 
of independent question, while complex table gives information about 
one or more inter-related questions. 


One way table is one that answers one or Toore independent 
questions. So it is a simple tabulation, The following table will 
explain the point— 


Table 1. Daily wages in Rs. obtained by 50 workers ina factory. 


No. of Workers 


20 

9 

8—10 10 
10—12 7 
12—14 4 
Total - 50 


The table shows the number of workers belonging to each class- 
interval of wages, We can now easily say that there are 20 workers, 
obtain wages between 4 and 6 (the minimum range) and there are 
4 workers, obtain wages between 12 and 14 (the maximum range). 
So this table reveals information regarding only one characteristic of 


Two-way table shows sub-division of « total and is able of 
answering two mutually dependent questions, In the above table 
(no. 1), if the workers are divided into Sex-wise, then we would get a 
two-way table as follows— 


Bie yo, ella 
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Table 2. Daily wages in Rs. obtained by 50 workers (sex-wise) 


No. of Workers 
Wage (2&3) | ——$__— ena 
3 A male | female | total 
12 8 
6 4 
4 3 
4 0 
32 18 


; The above table shows the wages obtained by workers and sex- 
wise distribution of workers in question. 


_ Three way table sub-divides a total into three distinct catagories 
and is capable of answering three mutually dependent questions. In 
the above table (no. 3), if the workers are divided into resident and 
non-resident (in the factory area), we would get a three way table as 
given below— 


Table 8. Wages (Rs.) obtained by 50 workers in a factory 
(sex-wise and resident-wise) 


Female Total 


wages resi- | non resi- resi- | non resi- 
(Rs.) | dent dont | dont | 2° | done |" dons | total 


4-6 4 
6-8 8 
8-10 3 
10-12 4 
12-14 3 
Total 17 


| 
| 


The table, shown above, gives the informations about (i) wages 
(in Res.) obtained by workers (ii) sex-wise distribution of these workers 
and (iii) distribution of workers on the basis of residence. 


If the table is again classified on the basis of different religions, 
states, nationalities, etc. it will give an example of Manifold tabula- 
tion. 
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Rules and Precautions for Tabulation. 


It is necessary to put down certain rules and precautions in 
drawing up the tables. For the construction of tables, the following 
points should be observed— 


(1) If the data are too large, then several separate tables Should 

~ be drawn instead of a single table. In such case, a single table will 

confuse eye and may lead to great difficulty in following the columns 

and rows ata glance. Of course, each table should be complete in 
itself and should serve a particular purpose. 


(2) The table should suit the size of the paper on which it is 
drawn. So the width of columns and rows should be decided 
properly. Totals, averages, percentages and the numbers for compari- 
fons, should be placed close together as far as possible. Unimpor- 
tant data may be placed in miscellaneous group. 


(8) For Separating data of one class from that of another class, 
thick lines should be used. Of course, thin lines may be used for 
separating the sub-division classes. 


(4) The table should be given a suitable title. The title or 
titles of sub-headings (i.e. captions and stubs) should be self-explana- 
tory. The column headings (i.e. captions) should indicate the unit used 
i.e, height in inches, price in rupees, weight in pounds etc, 


(5) Large digits may be approximated to thousands or, lakhs 
etc. This would reduce the unnecessary details. 


(6) Explanatory notes should be given always as footnotes and 
must be complete in itself. 


The source from which the data is obtained should be indicated 
in the footnote. This will help the reliability of the data, In case 
of any discrepancy or inconsistency found in the data, attention must 
be drawn by using footnote using reference like 3,2, or*,+ ete. 


_ (7) The items in the table should be arranged with some 
logical order. They may, however, be arranged in order of magnitude 
or alphabetical, geographical or in other suitable manner, 


(8) Before entering in the table, items should be checked up 
carefully. Arrangement should also be made for cross-checking. 
Over-writing in the table should be avoided. 


Example 1. 


Construct a blank table in which could be shown at different 
dates and in five industries the average wages of the four groups, 
males and females, eighteen years and over, and under eighteen 
years. (C. U. M. A. 1963) 


1104 | ayowof | ayow | yn40) 


ee 


aqpuef | appu 1970} ayouaf | appu 


aqpuaf | aypu | 79403 


TOAO “BIA QT “s14 QT JopuQ, 


TOAO “SIA QT “sa QT Jopuy 


hasnpuy 
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Example 2. 

Prepare a neat table, paying attention to headings, double lines 
spacing etc. showing fully the information in the following report 
as clearly as possible— 

During the quinquennium 1935—39, there were in Great Britain 
1,775 cases of industrial diseases made up of 677 cases of lead- 
poisoning, 111 of other poisoning, 144 of anthrax and 848 of gassing- 
The number of deaths reported was 20 p.c. of the cases for all the four 
diseases taken together, that for lead poisoning was 135, for other 
poisoning 25, and that for anthrax was 30. 

During the next quinquennium 1940—44, the total number of 
cases reported was 2807 higher. But lead-poisoning cases reported fell. 
by 351 and anthrax cases by 35, other poisoning cases increased by 748 
between two periods, The number of deaths reported decreased by 45 
for lead-poisoning but decreased only by 2 for anthrax from the pre-war 

to post war quinquennium. In the latter period 52 deaths were 
poisoning. The total number of deaths reported in 1940—44 including 
those from gassing was 64 greater than, in 1935—89, 


Industrial diseases in Great Britain 
—_—_—_——a_akwns 


Cases of Deaths 
ul | 2) Fly P 
é a A=) bo 3 
Bl ye| 33 E Bla lesls2;al2l4 q 
S is BO § 3 £|32)43)| = é ° 
a)o8 & /8e}/on| 4 a | & 


1935—89} 677 


il | 144) 848 1s | 135 | 25 30 165 | 355 


1940—44 s | 859 | 109 | 8288 | 4582 | 90 52 28 249 | 419 


Mechanical Aid to Computation 


Mechanisation become more and more useful in the field of 
computation of any practical problem. If the problem is a small one 
we can do it by manual, but if the problem is large or complicated 
we use to take mechanical aid of computation. Mechanical aids 
to computation include machine types like. 


(1) Listing and adding machines, 
(2) Calculating machines. 
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(8) Accounting machines. 
(4) Tabulating and Statistical machines. 


Adding machines are generally either hand-operated or electri- 
cally operated. Ordinary adding machines can only do addition 
process, when machine print the data in a tape in addition to adding, 
is called adding and listing machine. 


Calculating machines can do multiplication and division in 
addition to adding. They also do subtraction functions. These 
machines are mostly operated manually or electrically. 

Accounting machines can do calculations and post the result 
of calculation in a card at an appropriate place, Such as cash 
register, cost sheet preparation, stores ledgers, journal books, Material 
issue analysis etc. are done by this type of machine, 

Tabulating and Statistical machines are punch card processing 
machines. Here all basic informations are punched in a card and 
the cards are fed into machines which read the instructions and 
process the fed data and give the result ina form of printed out. 
The essential function of any punched card machines are as 
follows : 

(i) Card punch, 

(ii) - Card punch verifier, 
{iii) Interpreter. 

(iv) Sorter. 

(vy) Collator. 

(vi) Reproducer. 

(vii) Calculator. 

(viii) Tabulator. 


Computer is an improved type of tabulating and statistical 
machines. Two types of computers are generally used in computa- 
tions such as (a) Analog Computer and (b) Digital Computers. 
Analog Computers are required to measure physical quantity as 
energy, velocity etc. Digital computers are used in quantitative 
analysis, It can do all sorts of numerical calculation if proper ins- 
truction has been given to it, 


EXERCISE 2 
1. Define classification, what part does it play in Statistics ? 
State the different methods of classification of Statistical data. 


2. Discuss the function and importance of tabulation in a 
scheme of statistical investigation. What precautions should be taken 
in tabulation of data ? 
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3, What is a statistical table ? Explain clearly the essentials of 
a good table. 

Prepare a blank table showing the particulars relating to students 
of the Bombay University classified according to their ages, sex, faculty 
and three important religions, [ 0. A. Nov. 1969 } 


4. Define a statistical table and state the essentials of a good 
table. 

5. Draw a blank table showing exports and imports during the 
years 1960, 1961, 1962, 1963, 1964 relating to Ports Bombay, Calcutta, 
Madras and other Ports. The table should provide of the values and 
the balance of trade and the totals for each year. [. A. Nov. 1968] 


6. (a) Discuss the general rules which should be followed in 
tabulating statistical data and state the various types of tabulations 
and their uses. 

(b) Draw up a blank table with a suitable title, headings, 
double lines etc. in which could be shown the population of India by 
states during the 1961 census, classified according to sex, age groups 
and the two principal livelihood categories—Agricultural and non- 
agricultural. [I C. W. A. Jan. 1968 3 


7. The total number of accidents in Southern Railway in 1960 
was 3500 and it decreased by 300 in 1961 and by 700 in 1962. The 
total number of accidents in metre gauge section showed a progre- 
ssive increase from 1960 to 1962. It was 245 in 1960; 346 in 1961; 
and 428 in 1962. In the metre gauge section, “Not Compensated” 
cases were 49 in 1960, 77 in 1961, and 108 in 1962. ‘‘Compensated”” 
cases in the broad gauge section were 2867, 2587 and 2152 in those 
three years respectively. 

From the above report, you are required to prepare a neat table 
as par the rules of tabulation. {C, A. Noy. 1971 ] 


8, (a) What are the different parts of a statistical table ? 

(b) Present the following information in a concise tabulator 
form and indicate which type of lamp shows“the greatest wastage 
during manufacture : 

t “Lamps are rejected at several manufacturing stages for different 
faults. 12,000 glass tubes are supplied to make 40-watt, 60-watt and 
100-watt lamps in the ratio 1: 2:8. At the stage I, 10% of 40-watt, 4% 
of the 60-watt, and 5% of the 100-watt bulbs are broken. At the stage 
II, about 1% of the remainder lamps haye broken filaments, At the 
stage IIT, 100-watt lamps have badly soldered caps and half as many 
have crooked caps; twice as many 40-watt and 60-watt lamps 
have these faults. At the stage IV, about 3% are rejected for bad 
type making and 1 in every 100 are broken in the packing which 
follows.” (1.0. W..A. July 1971] 
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9. Draw up a blank table in which could be shown the number 
of persons employed in six industries on two different dates, distingui- 
shing males from females and among the latter, singles, married and 
widows. { I. C. W. A. Jan. 1973 ] 

10. In the Sutton coalfield region the number of cinema 
admissions in the four quarters of 1950 was ('000): 11,008; 9,998; 
9,933 and 9,406. For the four quarters of 1951 they amounted to 
('000) : 10, 521 ; 9,677; 9,869 and 9,568. The corresponding number 
of television licences per 1000 population was 12; 18; 24; 87 ; 62; 
64 ; 69 and 81. In the Holme Moss region cinema admissions during 
the eight quarters of the two years were ('000); 18,290; 16,420: 
16,973 ; 15,937 ; 17,940 ; 16,481 ; 16,336 and 16,136. The quarterly 
number of television licences per 1000 of the population in the Holme 
Moss region during 1950 and 1951 was 2, 3, 4, 6, 9, 13, 14 and 29. 
Transmission of television started in the Holme Moss region during 
the fourth quarter of 1951. 

Arrange the above information in a tabular form. What do 
you deduce from these figures ? 

11. Prepare a blank table to show the exports of three com- 
panies A,B,C to the five countries. U.K., U.S.A, U.S.S.R., France 
and West Germany, in each of the years 1970—74. 

(I.C.W.A. Dec. 1975] 

12, What is the purpose of tabulation of statistical data ? 
What general rules should be observed in constructing a statistical 
table ? [I. ©. W. A. Dec. 1974 ] 

13. Draw up a blank table to show the numbers of candidates, 
Sex-wise, appearing for the Pre-University, First Year, Second Year and 
Third Year examinations of a University in the faculties of Arts, Science 
and Commerce in a certain year. [1. 0. W. A. Dec. 1974 ] 

14. State briefly the requirements of a good statistical table. 


Prepare a blank table to show the distribution of population of 
the various States and Union Territories of India, according to sex and 
literacy. [I. 0. W. A. June, 76 ] 
“15. Represent the following information in suitable tabular form 
with proper rulings and headings : 

The annual report of the Ishapur Public Library reveals the 
following points regarding the reading-habits of its members. 


Out of total 3,713 books issued to the members in the month of 
June 1970, 2,100 were fictions. There were 467 members of the library 
during the period and they were classified into five classes A, B, 0, D 
and E. The number of members belonging to the first four classes were 
respectively 15, 176, 98 and 129 ; and the number of fictions issued to 
them were 103, 1187, 647 and 58 respectively. Number of books, 
other than textbooks and fictions, issued to these. four classes of 
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members were respectively 4, 390, 217 and 341. Textbooks were 
issued only to members belonging to the classes C, D and E and the 
number of textbooks issued to them were respectively 3,317 and 160. 


During the same period, 1246 periodicals were issued. These 
included 896 technical journals of which 36 were issued to members of 
class B, 45 to class D and 315 to class E. 

To members of the classes B, OC, D and H the number of other 
journals issued were 419, 26, 231 and 99 respectively. 


The report, however, showed an increase by 3°9% in the number 
of books issued over last month, though there was a corresponding 
decrease by 6'1% in the number of periodicals and journals issued to 
members, [I. 0. W. A. June, 77] 


16. Represent the following data in a tabular form : 


A firm processes a certain raw materials by the use of two major 
types of equipments, called stills and retorts. Four different produc- 
tion processes are available to the firm. One unit of each of the 
processes I, II, III and IV will weekly treat 100 tones of raw meterial. 
Processes I and II will absorb 7 per cent and 5 per cent of the weekly 
capacity of the firm’s stills and 3 per cent and 5 per cent respectively 
of the weekly capacity of the retorts. Processes III and IV will absorb 
3 per cent and 2 per cent of the weekly capacities of the firm’s stills 
and 10 per cent and 15 per cent respectively the weekly capacity of 
the retorts. The four processes I, II, III and IV will respectively yield 
a final product worth $1100, $1120, $1180 and $1150 but will 
consume raw material of value $1000, $ 1000, $ 1000 and $ 1000 
respectively. The other direct costs required by the four processes 
are worth $ 50, $ 60, $ 40 and $ 60 respectively. 


[I.C. W. A. June, 79] 
17. Present the following data in a tabular form : 


A certain manufacturer produces three different products 1, 2 
and 8, The product 1 can be manufactured in one of the three plants : 
A, Band 0, However the product 2 can be manufactured in either 
plant B or plant CO, whereas plant A or B can manufacture product 3. 
The plant A can manufacture per hour 10 pieces of 1 or 20 pieces of 3. 
20 pieces of 2, 15 pieces of 1 or 16 pieces of 3 can be manufactured 
per hour in plant B, 


Whereas © can produce 20 pieces of 1 or 18 pieces of 2 per hour. 
‘Wage rates per hour are Rs. 1°50 at 4, Rs. 3°00 at Band Rs. 2°00 at C. 
‘The cost of running plants A, B, C are respectively Rs. 200/-, Rs. 100/-, 
Rs, 250/- per hour. The materials and other costs directly related 
to the production of one piece of the products are respectively Rs. 10 
for 1, Rs. 12 for 2 and Rs. 15 for 3, The company plans to market 
the product 1 for Rs. 15 per piece, the product 2 for Rs. 18 per piece 
and the product 3 for Rs. 20 per piece. [I. C. W. A. Dec. 79 ] 


| 
| 
| 
| 
| 


PRESENTATION OF DATA, 
GRAPHS AND CHARTS 


Introduction, » 


In the previous chapters we haye discussed how huge 
statistical data are condensed and presented in an intelligent form 
of a table. Various statistical methods like classifications, tabula- 
tions, averages and index numbers ete. reduce the complexity of statis- 
tical data in a simpler form. Now classification and tabulation are 
meant for systematic presentation of data, while measures of central 
‘tendency and index numbers (these chapters will be discussed later 
on) help for comparing data by converting into single figures. These 
chapters have their own limitations. One more effective method of 
representing data is by the help of graphs, charts and diagrams, by 
means of which the true significance of a set of figure can be easily 
grasped. Of course, it is true that graphs and charts add nothing to 
the information already obtained but they bring out clearly the relative 
importance of different figures, and often, are necessary in finding out 
the trend of the values or variations in the values, in relations to time. 
The special feature of diagrams and graphs is that they present dry 
and uninteresting statistical facts in the shape of attractive and 
appealing pictures and charts. 


Usefulness. 


: The advantage in diagrammatic presentation of data is 
that diagrams and charts are attractive to common people. Common 
people in general, avoid figures, but always search for pictures 
and diagrams, even when reading general books. Graphical represen- 


: 


/ 


7 


tations are particularly useful when comparisons are to be made | 


between two or more sets of data. To an economist, difficult theories | 
can be easily understood if proper diagrams are used. Diagrams — 
save much valuable time, which would be lost in grasping the 
significance of numerical data. 7 
Limitations, ) 
The charts do not show details, which is possible in 
a table. Graphical representation reveals only the apron’ 
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position, where as in a table, we can show exact figures. If a © 
statistician or an investigator wants to do an exhaustive study of 
figures, then the utility of diagrams is not much. Representation of 
data by means of graphs or diagrams may often give misleading impres- 
sions to people. Advertisers or politicians misuse this form of 
representations of facts and try to mislead the common people. 


Functions, 
The following two functions are served by graphs and diagrams : 
(1) They make complex data simple and easily understandable. 


(2) They help to compare the related data, placing the graphic 


or diagrammatic representation near to each other. 
x 


GRAPHIC REPRESENTATION 


Graphs are useful for representing data relating to time or for 
representing frequency distribution. 


Construction. 


Two straight lines are drawn cutting each other at right 
angles. The horizontal line (KX’) is called abscissa or X-axis, 
while the vertical line (YY’) is 
known as ordinate or Y-axis. The 
point (O) at which the lines meet is 
known as origin, (See the figure). 


Distances. measured from O 
towards right or aboye are reckoned 
as positive while those measured 
towards the left or downwards as 
negative. Thus XX’ and YY’ divide — 
the plane i.e., graph paper into four 
parts, known as Quadrants. All 
points in the plane are located by 
two co-ordinates drawn parallel to 
the axes. 


5 For each axis, a convenient 
Fig. 1 : " 
seale is chosen. It is not necessary 
that the two scales of the axes should be same. In the above graph, 
scales of the axes are same and the following points have been plotted. 


Points x ¥ 

2) 4 3 

t Q -4 2 
i R +3 -3 
8 2 -2 
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The dotted lines forming P, Q, R and § need not be shown. 


Only the points should be shown. Thus we find, that a point on 
@ graph paper is a function of two variables. 


A graph must be accompanied by its heading showing in detail 
the nature of the graph. 


GRAPHS OF TIME SERIES 
Natural Scale. 


Graphs of continuous time series are known as Historigram, 
which may be constructed on natural scale or on ratio scale. First 
on natural scale, we will discuss the following graphs. 


1, Absolute Historigram of one variable—changes of a single 
variable over a period of time. 


2. Absolute Historigram of two or more variables—changes of 
two or more variables over a period of time. 


8. Index Historigram—If the values are represented by index 
numbers and if these indices, instead of actual values, are plotted 
then Index Historigram is obtained. 


(1) Absolute Historigram (or Line Chart or Linear Graph). 


Let x and y are two variables (discrete or continuous) such 
that for each value of x, there corresponds a value of y-. If for one 
value of x along X-axis, the corresponding value of y along Y-axis be 
plotted then corresponding to a set of values of x, we get a set of values 
of y. Then the straight lines or curve obtained by joining the corres- 
ponding points is known as Absolute Historigram, 


Example. The monthly productions of bi-cycles in a certain 
factory are as follows— 

Jan.—70, Feb.—90, March—80, April—120, May—100, June— 
120, July—110, Aug.—125, Sept.—130, Oct.—150, Nov.—100. 

—Draw a Historigram (absolute) to represent the above data. 

Since the values of both the variables are positive we shall draw 
only one quadrant (i.¢., Ist quadrant) is which both the variables are 
positive. 

We represent the months along X-axis and corresponding pro- 
ductions along Y-axis according to the scale mentioned below— 


1 division along X-axis =1 month, 1 division along Y-axis =10 units. 
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Graph showing monthly productions of Bi-cycles 


Titi 
J FMAM) JLASOND 
¢ MONTHS —> 


Fig. 2 


Use of False Base Line. 


If size of items is big and the vertical scale starts from zero, 
the curve would be almost on the top of the graph paper, as shown 
above. In order to use utmost space of the graph and to be technically 
correct also, the false base line is used. 


Generally the vertical scale is broken into two parts and some 
blank space is leftin between them. The lower part starts from zero 
and the upper part starts with a value or nearly equal to the minimum 
‘value of the variable, Usually saw-tooth lines are used to break the 
‘yertical scale. The false base line is used to represent the above 
example graphically. 


JFMAMJJASOND 
MONTHS > 


Fig. 3 
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(2) Absolute Historigram of Two or More Variables. 


Two or more variables (of same unit) can also be drawn on the 
same graph paper. The procedure of drawing is the same as in the 
previous case. In such case, we will find two or more curves. f 


Example. The table below gives the income, expenditure and 
profit (or loss) of a shop during the whole year : 


Months Income (Rs.) Expenditure (Rs.) Profit (Rs.) 


Jan. 700 500 200 
Feb. 800 550 250 
March 500 600 -100 
April 550 700 -150 
May 600 500 100 
June 550 400 150 
July 700 500 200 
Aug. 800 550 250 
Sept. 750 400 850 
Oct. 700 500. 200 
Nov. 800 550 250 
Dec. 750 450 300 


Draw income, expenditure and profit graphically on the same 
graph paper. 


Scale: 1 division along X-axis=1 month 
ha) » Y-axis = Rs. 100 (profit) 
Te »  Y'-axis=Rs. 100 (loss) 


Graphs of Income, Bxpenditure and Profit 


an 

aN 
CI 

is 


[TTA 
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Note. ‘The false base line cannot be used in this type of graph, 
where profit and loss are to be shown. 


(3) Index Historigram. 


The drawing is similar to the previous graph, but only the 
index numbers are to be plotted instead of variables. 


Example. : 
Small Savings during the year 1960-1964 are shown below, 
including the index numbers. Draw the index historigram. 


Year  Savings(Rs.) Index (1961=100) 


1960 250 125 
1961 200 100 
1962 300 150 
1963 316 158 
1964 400 200 


Index Historigram of Small Savings (1960-1964), (1961 = 100) 


Fig. 5 


Note. Index Historigram may be also of two or more variables. 
The drawing is similar to the drawing of Absolute Historigram of two 
or more variables. 


OTHER GRAPHS 
There are certain other graphs, which are becoming popular. 
(1) Mixed Graphs. 


The graphs are prepared to study the inter-related values. 
In such graphs one variable is usually shown by bar-diagram and 
the other by a curve. 
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Example. 


Quantity and Price of a Commodity (1965—70) are shown 
below. To draw a mixed graph. 


Year 1965 


Quantity (Kg.)| 55 


Price (Rs.) | 250 | 190 


The above figures can be represented by means of a mixed graph, 
The quantity will be represented by yertical bars (the length of the 
bars is proportional to the values they represent) and the price will be 
shown by historigram. 


Quantity and Price of a Commodity (1965—1970) 


4966 «1967 «1968 1989-1970 
YEAR—> 


Fig. 6 


é 


From the above graph, the variations of quantity and the price, 
year to year, can be studied. The two variables move in the same 
direction. 


(2) Zone Graph. 


Sometimes it becomes necessary to represent the maximum 
and minimum values of a given set of variables by means 
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of a graph. In such case, we are to put two marks—one for 
the maximum and the other for the minimum yalue, on the appropriate 


data. The space between these points are made prominent by 
thickening the lines. 


Example. 


Average prices of gold in a certain city (per tola) in terms 
of Rupees, are shown below— 


Year 1936 37 38 39 40 


Maximum 


Price (Bs.) 86°75 35°50 35°20 37°70 88°10 


Minimum 31°25 33°94 ‘ i 5 
Price (Rs.) 34°25 34°75 85°12 


Maximum and Minimum Prices of Gold 


a OU 


(8) Band Curve. 


It is a type of linear graph, used to represent the total 
of component parts of some data spread over a period of 
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time. The components may be plotted one above the other, using 
This type of graph is useful 
for studying total cost divided in various component parts. 


different shades in the space formed. 


Example. 


Represent the following data graphically—Advances granted by 
Primary Agricultural Credit Societies (in crores of Rs.) 


Year Bombay Madras | All-India 
(1) (2) (3) (4) 
1946-47 170 3°47 9°03 
-48 2°22 4°40 10°45 
-49 3°29 4°96 14°40 
-50 5°29 6°44 17°99 
51 6:90 765 22°90 
-52 812 733 24°21 


(Source : All India Rural Oredit Survey, 1854) 


Calculation for Plotting the Data 


Col. (2) + Col. (3)| Col. (2)+ Col. (3) + Col. (4) 


517 
8°62 
8°25 
11°73 
14°55 


15°45 


14°20 
19°07 
22°65 
29°72 
37°45 


39°66 
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) 
1946-47-48 -49 -50 -51 —52 
YEAR» 


Fig. 8 
Ratio Scale. 


Logarithmic Charts or Ratio Charts. In the graph drawn 
before, we have the natural scale, when same number of units in the 
graph, being represented by same distance. For example, if the pair of 
numbers 20,30 and 1000,1010 are plotted in natural Seale 
(or ordinary) graph, then the vertical distance between the two pairs of 
points will be the same since each pair differs only by 10. 


Again when the variable changes from 20 to 80, there is an 
increase of 10 i,¢., 50% while for the change from 1000 to 1010, the 
increment is 1%. Now we see that the relative changes are different, 
though the actual change is the same. Natural scale graph shows 
only the actual change and not the relative change, while 


logarithmic graph shows only the relative change and not the actual 
change. 


Tn order to compare such relative changes over a period of time, 
4 special form of graph known as Logarithmic Graph (or Chart) or Ratio 
Chart is used. In this graph the vertical axis (i.e, Y-axis) represents 
the logarithm of the values of the dependent variable, while the 
horizontal axis (i.¢,, X-axis) represents the year or months as in the 
case of arithmetic scale. 

For if y be the dependent variable, 


in Y-axis, then in ratio chart, the yal 
tabulated, 


Logarithmic Chart is also Sometimes called Semi-logarithmie 
chart since the vertical scales are logarithmic and the horizontal scales 


whose values are represented 
ue of log y (and not of 4) is 


PRESENTATION OF DATA, GRAPHS AND CHARTS \ 43 


remain the same absolute (or arithmetic), Equal vertical changes on 
a logarithmic chart represent equal percentage changes and not equal 
actual changes. 


Characteristics. , 


1. Ratio chart has no zero (point), since it compares the relative 
changes. A natural scale has zero, since it compares absolute values. 
Naturally, zero line is essential in case of natural scale but not for the 
logarithmic scale. 

2. They are particularly useful for representing graphically a 
very wide range of values i.e, for values ranging from 10,000 to 
1,0,0,00,000. 

8. Equal vertical distances in a logarithmic graph, represent the 
same relative change, while in case of natural scale graph, they repre- 
sent equal absolute changes. 

4, Natural scale graph can show the negative values while 
as grad graph cannot show negative values since it has no zero 
point. : 

5. Logarithmic scale is specially important in the case of index 
historigrams. They should be generally drawn on ratio scales, because 
index numbers are more concerned with proportionate changes than 
with actual ones. 

6. Ratio scale makes extrapolation—finding out a future possible 
figure, if the data are organic in character. For example, if the 
population figure of a certain country is plotted ona ratio scale, then 
the curve obtained may be extended in continuation with its trend 
beyond the last date to the next date, to obtain a fairly accurate 
estimate of the next figure. 

7. A logarithmic chart can be drawn either on an ordinary 
graph paper by plotting the logarithm values of the dependent variable 
or by plotting the actual values on a semi-logarithmic graph paper. 


Example. 


The following table gives the total units produced at the begin- 
ning of the different years. Represent the data graphically and estimate 
the mid-year values for 1949 and 1953. 


Year 1947. '48 ‘49 "50 ’51 '52 "63 BA 55 


Units 
produced. 20 62 147 300 586 811 1104 1425 1755 
(I. 0. W. A. Jan. 1965] 


Since the units produced differ widely i.e. 20 in 1947 to 1755 in 
1955, the ratio chart is suitable for representation of the above data. 
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Catculation. 


. Log y 
Units (y) (approximate) 


20 , 1:30 
62 ‘ 179 
147 217 
2°48 
2°73 
2°91 
3°04 
3°15 


8°24 


, [| 
M7 48 49 50 Bt 5253 54 55 
YEAR ~> 


Fig, 9 
a ira 
fgets Melee Higiectne two horizontal li 
linda fd ca Se con cent 
80 the required estimates are 209 and 1959, ao 
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GRAPHS OF FREQUENCY DISTRIBUTION 


In constructing frequency graphs, the values of the variable are 
measured on X-axis, while the corresponding frequencies are taken on 
Y-axis. The following types of graphs can be constructed to represent 
frequency distribution— 


1. Individual Observation Series Graph: Let us take any 
set of data relating to individual items, for example, the followings are 
the marks obtained by 15 students in a certain class-test of Mathe- 


matics. 
(Full Marks : 50) 


ee ————————eeeSeSEeeSSSee 


Serial no. Marks Serial no. Marks Serial no. Marks 
1 20 6 82 11 39 
2 22 7 84 12 40 
3 25 8 4 37 13 49 
4 a7 9 88 14 42 
5 82 10 38 15 46 


To represent the data graphically. 


Marks obtained by 16 students in Mathematics 


12346678901 1213415 
SERIAL NO.—> 


Fig. 10 


9. Discrete Series Graph: A discrete series is one in which 
an item cannot assume any value in a class-interval. The value of the 
jtem is fixed and definite. This type of series is represented by line or 
Bar-Frequency Diagrams. 
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Bars may be vertical or horizontal. The length of the bars is 
Proportional to the values they represent. The base line should be 
zero, when bar-charts are used for comparison. 


Example 


Marks 


No, of 
Students 


STUDENTS > 
Of=rnuUuURUAYVFG OS 


Note. It may be noted from the above graph that there are no 
students securing marks between 1 & 2 or 3 & 4 eto, For horizontal 
bar-graph see Fig. 18, 


8. Continuous Series Graph: It is one in which an item can 
assume any value within a particular class-interval. 


present the class-intervals and on these markings, ri 


ectangles are drawn 
by taking the lengths of the class-intervals as bread 


th and corresponding 
series of rectangles are obtained 
whose total area represents the total of the class-frequencies, The 
figure thus obtained is known as Histogram. 
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Example. 


Marks | 20-30 30-40 | 40-50 50-60 | 60-70 | 70-80 | Total 


No. of 


Students! 15 55 


The X-axis is marked off suitably to represent the class-intervals 
of marks (in question). Now, on these 
class-intervals rectangles are drawn 
taking the corresponding frequencies 
(in this case—No. of Students) as height. 


Note. We can also estimate mode 
of a frequency distribution by the help 
of a histogram shown in the chapter of 
Average. 


Histogram (when class-intervals 
are unequal). If the class-intervals are 
unequal, the frequencies must be adjust- 
ed before constructing the histogram. 
Adjustments are to be made in respect 
of lowest class-interval.. For instance, 
if one class-interval is twice as wide as 
the lowest class-interval, then we are to 
divide the height of the rectangle by 20 30 40 50 60 70 80 
two, and if again it is three times more, hiccin at 
then we are to divide the height of its Fig. 12 
rectangle by three and so on. 


STUDENTS —> 


co S38 8 F&F SSBB 


Example. 
Represent the following data by means of a histogram : 


Weekly Wages No. of Weekly Wages No. of 
(Bs.) Workers (Rs.) Workers 
10—15 7 30—40 . 12 
15—20 19 40—60 12 
20—25 Q7 60—80 8 
25—80 15 5 


[C. A. 1963 ] 
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Since the class-intervals are unequal, frequencies are adjusted 
as follows— 


Weekly Wages (Rs.) | 10-15 | 15-20 | 20-25 | 25-80 | 80-35 | 85-40 | 40-45 | 45-50 


No, of Workers 


itt | 
0 % 2 2% 30 3 40 45 50 55 60 65 70 75 60 
WEEKLY WAGES IN R5.—> 


Fig. 13 


Histogram (when only mid-poinis are given). When only mid- 
points (of class-intervals) are given, we are to ascertain the upper and 
lower limits of the various classes and then to construct the histogram. 


Example. 


Draw 4 histogram of the following frequency distributions : 4 


Life of Electric Lamps , 
(in hours) mid-values 1010 1080 1050 1070 1090 
Firm 10 130 489 360 18 
—————————————— 
[1. 0. W. A. 1968 ] 
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From the ‘mid-values, the class-limits are ascertained as given 
below : 


Life of Elec- 
trie Damipe 1000-1020 | 1020-1040 | 1040-1060 | 1060-1080 | 1080-1100 


Frequency 10 130 482 860 18 


Now the histogram can be drawn easily, similar to Fig. 13. 


Histogram (for discontinuous growped data). For discontinuous 
grouped data, firs; we are to make class-boundaries (discussed in 
detail in the chapter of Frequency Distribution) then to draw the 
histogram by usual method. 


! Example. 


Class- : cs 
Theis [20-19 20-20 | 90-89 40-49 | 50-59 60-69 |70—79 | 80-89 


‘Frequency 


Class- 9°9— |19°9—| 29:9 — 
boundaries 199 | 29°9 k ( 59°9 | 69°9 | 79°9 | 89°9 


Frequency | 5 9 14 25 15 | 8 4 


Now taking the class-boundaries on X-axis, and corresponding 
frequencies as heights of rectangles, draw the histogram. 


} (ii) Frequency Polygon.* The line chart obtained by joining 
successively the middle-points of the tops (uppermost sides) of the rec- 
tangles i in histogram by straight lines, is known as Frequency Polygon. 

4 is customary to join the extreme two middle-points to the base line 
‘at the middle-points of the next class-intervals. The area covered by 
tthe frequency polygon is nearly the same as by the histogram. 

: The dotted line of the Figure (12) represents the Frequency 

Polygon. 

The frequency polygon can also be drawn without the help of 
a histogram. Points are plotted by taking the middle-points of the 


* Polygon literally means many angles. In. statistics if means a curve 
representing a frequency distribution. 
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class-interyal as abscissa (e-coordinate) and the corresponding fre- 
quency as ordinate (y-coordinate). Then the line chart obtained by 
joining such points by straight lines is known as Frequency Polygon. 


Example. 


Class-intervals 
(marks) 


20-30 
80-40 
40-50 
50-60 
60-70 
70-80 


Total 


Mid-points | No. of Students 


Fig. 14 
Gii) Frequency Curve. Tho frequency polygon consists of 


sharp turns, ups and downs. To remove th 
ese sharps features of a 
polygon, it becomes necessary to smooth it. There uo definite rule 


for smoothing the polygon. Fi i 
pitino polygon. Figure (14) shows the smoothed curve (i.e. 


(iv) Ogive or Cumulative Frequency Po, 
) > ih Zon. 
cumulative frequencies are plotted against the i al ae od 
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successive points are joined by straight lines, we get what is known 
as Ogive (or cumulative frequency polygon). There are two types of 
Ogive : 


(a) Less than type—Oumulative frequencies from below are 
plotted against the upper class-boundaries. 


(b) Greater than type—Cumulative frequencies from above are 
plotted against the corresponding lower boundaries. 


The former is known as less than type, because the ordinate of 
any point on the curve (obtained) indicates the frequency of all values 
less than or equal to the corresponding value of the variable represen- 
ted by the abscissa of the point. Similarly, the latter one is known as 
greater than type. 


Frequency distributions of marks obtained by 170 students 


| Cumulative Frequency 


Class-intervals | Class-boundaries | Frequency | from below | from above 
(marks, (less than | (greater than 
30°5, 40°5 | 20°5, 30°5 
ete.) etc.) 
21—30 205—30°5 | 16 15 170 
31—40 30°5—40°5 25 40 155 
41—50 40°56—50°5 40 80 130 
51—60 50°5+60'5 60 140 90° 
61—70 60°5—70°5 20 160 30° 
71—80 70°5—80'5 10 170 10 
Total 


Note. From the above figure, it is noticed that the ogives 
cut at a point whose ordinate is 85 i.e., half the total frequency and the 
corresponding abscissa is 51°38, which is the median of thé above 
frequency distribution (see the sum on median, in the chapter of — 
Average). Even if one ogive is drawn, the median can be determined 
by locating the abscissa of the point on the curve, whose cumulative 


frequency is N- Similarly, the abscissa of the points on the Jess 
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Ogive of Marks Obtained by 170 Students 


evod : 3N 
than type ogive corresponding to the cumulative frequencies : and irk 


give the Q, (first quartile) and Qs (third quartile) respectively. (Qi, Qs 
will be discussed after median in the chapter of Average). 


Example. 


“The following table gives the average earnings of the mill-workers 
in a certain city ; 


Ce 
Monthly Wages Frequency Monthly Wages Frequency 
(in Rs.) (in Bs.) 


18 a1 49 36 
a1 29 45 45 
~ 94 19 48 27 
a7 39 51° 48 

) > 80 43 BA a1 
38 94 57 12 
36 73 60 5 
39 68 


Draw a histogram and a frequency curve for the data given 
above. Find the number of mill-workers whose wages lie between 


Bs. 31 and Rs, 58. [ B. Com. Madras 1962 ] 
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We are to make the data in the form of a frequency distribution 
with class-intervals, as shown below. 


Monthly Wages Frequency Monthly Wages Frequency 

(Rs.) (Bs.) 

18-21 21 42-45 86 
21-24 29 45-48 45 
24-27 19 48-51 Q7 
27-30 39 51-54 48 
30-33 43 54-57 21 
38-36 94 57-60 12 
36-39 73 60-63 5 
89-42 68 


Now the histogram can be easily drawn, for reference see. Fig. 12. 


For the second part, we are to make cumulative frequéncy dis- 
tribution as follows— 


Monthly Wages Oum. Frequ. Monthly Wages Oum. Frequ. 
(Rs.) (less than type) (Rs.) (less than type) 
less than 21 Q1 less than 45 492 
wow 24 50 iin eS. 467 
Pe oa 69 Gp ok 494 
wicca 80 108 emerges} 542 
Bede ight 151 oes AUT 563 
Aree ats:) 245 assp een: 575 
Fupeei asus) 318 aie gaaeon 580 
ee ee 386 


From the Graph [ Fig. 16], it is clear, that the number of workers, 
whose wages lie between Rs. 31 and Rs. 53 is 530-120 i.e. 410. : 


Note. For drawing the curve from Jess than 18, we are to star’ 
from X-axis 7.e. the ogive should start from X-axis itself. 
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Cumulative Frequency Ourve 


Ha 
0 CET TTT 
21 24 27 W 33 %6 99 42 45 43 51 54 57 60 63 66 
Wages in RS —> 


Fig. 16 


DIAGRAMMATIC REPRESENTATION 


Data may also be represented in the form of a surface figure (i.e. 
in the form of a diagram), other than a curve or graph. For drawing 


@ graph or curve, usually @ graph paper is required, whereas in case : 
of a diagram, plain Paper may be used. 


Types of Diagrams. 


The followings are the important types of diagrams, common 
in use : 


(i) One dimensional diagrams i.e. lines or bars drawn to a common 
scale, 


(ii) Two dimensional diagrams i.e. rectangles, squares and circles, 
whose areas are made Proportional to the given figures. 
. (iit) Three dimensional diagrams i.e, 


cubes, cylinders or blocks 
whose volumes are made proportional to the 


given figures. 
(iv) Pictograms i.e. statistical pictures, 


(x) Cartograms i.e. statistical maps, 
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Directions for Drawing Diagrams. 


It is mentioned before that diagrams give a pictorial representa- 
tion of quantitative data and show nothing beyond it. Diagrams are 
not suitable for further analysis of data, which can be done from 
figures. Before drawing a diagram, one must be sure that the data 
are capable of diagrammatic representation. All types of data cannot 
be represented by diagrams. Statistical data should be homogeneous 
and comparable for such representation. For instance, a set of figures 
relating to a number of sheets, a number of patients, and a number of 
schools in relations to our country cannot be represented by means of 
diagrams. The figures are entirely unrelated. A single figure is also 
useless for such presentation, 


Another point that should be kept in mind is that diagrams are 
not the substitutes of the real magnitude of the quantity they repre- 
sent, The size of the diagrams changes with the change in the 
scale to which it is drawn. The same data drawn in two different 
scales will yield diagrams of different sizes. The scale of the dia- 
grams should be appropriate and also should suit the size of the paper 
on which it is drawn. The scale should always be indicated in the 
figure, as without it, no diagram is complete. A good diagram should 
be neat, clean and appealing to eye. Each diagram should have a 
proper heading. 


All types of diagrams are not suitable to represent all types of 
data. It is essential to select that diagram which suits best to re- 
present the data given, otherwise misleading impressions may be 
created. 


One Dimensional Diagram 


(1) Simple bar-diagram consists of » number of bars of uniform 
width separated by equal intervening spaces. The length of the bars 
is proportional to the values they represent. The bars may be 
placed vertically or horizontally. Bar-diagram is generally used to 
represent a time-series. The base line should be the zero-line, when 
bar-diagrams are used for comparison. 


Example. 
The monthly productions of bi-cycles in a factory are as follows— 
January—70, February—60, March—90, April—80, May—100, 
June—110, —Represent by simple bar-diagram. 


Scale: 1 division along Y-axis=10 units. 
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Monthly Productions of Bi-cycles 


Fig. 17 


Example. Construct as horizontal bar-diagram 
expenditure of First Five-year Plan in West Bengal. 


(Crores of Rs.) 
On Industries 110°00 
On Irrigation 67°50 
On Agriculture 90°00 
On Transports and Roads 42°60 
On Miscellaneous 50°00 


Scale: 1 division along X-axis=10 crores of Re. 


Expenditure in first five-year plan in West Bengal 


showing 
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(2) Multiple (or Compound) bar-diagrams. The technique of 
simple bar-diagrams may be extended to represent two or more sets of 
inter-related data in one diagram. So multiple bar-diagrams supply 
information of more than one phenomena. 


Example. 


Population of Men and Women in districts of North Bengal, 
according to the Census 1961 are shown below : 


Districts Men Women 
Darjeeling 3,384,553 2,90,326 
Jalpaiguri 7,832,590 6,27,520 
Cooch Behar 5,39,798 4,79,953 
West Dinajpur 6,96,759 6,338,587 
Malda 6,22,092 5,98,399 


—Represent the data by multiple bar-diagrams. 


Scale: 2 divisions along Y-axis =1 lakh. 


Population in the districts of North Bengal 


he | 


WOAA GG 


SG UD 


QQ 


Example. 


Allotment of Money to West Bengal in first three Five-year 
Plans are as follows— 
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Five-year Plan| 1/1 2 | 3 
Rs. (in crores) | 70 | 155 | 340 


A triple bar-diagram showing allotment of Money to West Bengal 
in three Five-year Plans 


ee 2S 
FIVE YEAR PLAN» 
Fig. 20 


king manner the relation between the different parts and also between 
the parts and the whole. 


Example. 


The following table shows the total cost (in rupees) and its 
component parts in two consecutive years, 


1970 1971 
Direct material 4,000 5,500 
Direct labour 5,000 6,000 
Direct expenses 1,200 1,500 
Overhead 2,500 2,000 


12,700 15,000 
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Component bar-diagram showing total cost and its components 


4 

12 Govervean 

iw EEE orecr expenses 
8 DIRECT LABOUR 
& Ei rect maven 
4 


1970 1971 
Fig. 21 


(4) Sub-diyided bar-diagram on percentage basis. Compari- 
son of the related data by the above process may be misleading in some 
cases, A proper and fair comparison may be possible by placing the 
related data in the same footing Here items constituting the aggre- 
gate are expressed as percentages to the aggregate. The length of the 
bar is equal to 100, and from this, sub-divisions are made according to 
the percentages they bear to the aggregate, to represent the compo- 
nents. This helps comparison very simple and clear. Use different 
shades for different components. 


Example. 


The cost, sale proceeds and profit (or loss) per chair during 
1967, ’68, ’69 are given below : 


Particulars 1967 1968 1969 
Wages 8 8 11 
Other costs 6 56 6 
Polishing 4 64 5 
Total cost 18 20 22 
Sale proceeds 20 20 20 - 
(per chair) 
Profit (+) or loss (—) (+)2 nil (=)2 


(per chair 
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To represent the above data by sub-divided bar-diagrams on 


percentage basis. 


Before constructing the diagram, we are to convert the quan- 
tities into percentages of the sale proceeds as follows : 


Particulars 1967 
% 
Wages 40 
Other costs 30 
Polishing 20 
Total cost 90 
Sale proceeds 100 


"Profit (+) or loss (=) (+) 10 
(per chair) 


1968 1969 
% % 
40 55 
98 30 
82 25 
100 110 
100 100 
nil (—)10 


Percentage of cost, proceeds, profit (or loss) per chair 
during 1967, '68,’69 


Fig. 22 


Two Dimensional Diagram. 


(waces 
EBsomer costs 
POLISHING 


BB Prorir 


(1) Rectangular diagram. In one dimensional di i 
: i lagram, as dis- 
cussed previously, only the length of a bar was taken into vite a 
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not the width. But in two dimensional diagrams, both the length and 
width are to be taken into consideration. The area of a rectangle is 
equal to the product of its length and breadth. The area of a rectangle 
represents the size of the item. The process is similar to that of 
sub-divided bar-diagrams on percentage basis, except the widths which 
vary in proportion to the aggregate of each item. 


Example. 


The student population of the colleges A and B in the different 
departments are shown below : 


Oollege A College B 
Arts 800 400 
Science 500 200 
Commerce 900 250 
Law 300 150 


—To represent by the rectangular diagrams, 
» The ‘aggregates of the students of the two colleges A and B 
are 2500 and 1000 respectively. So the widths of the rectangles will 
be proportional to 2500 : 1000 or 5:2 (the lengths of the rectangles 
should be same). Conversion of the figures into percentages of the 
aggregate are required before constructing the diagrams. The conver- 
sions are given below— 


College A College B 
Students Percentage Students Percentage 


Arts 800 32 400 40 
Science 500 20 200 20 
Commerce 900 36 250 25 
Law 300 12 150 §=15 
ST SAN ETL alee ie eC SS 
Total 2500 100 1000 100 


Now instead of showing percentages, the actual figures can also 
be drawn, In such case, the width and length of the rectangles will 
vary. 
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Rectangular diagram showing student population of colleges A and B 


COLLEGE 
8 


Fig. 23 
Example. 


Gost of Production, Profits and No. of Units produced by two 
factories A and B. 


Particulars Factory A(in Rs.) Factory B (in Bs.) 


Baw materials 100 50 
Wages 250 100 
Total costs 350 150 
Profits 150 90 
Total sales 500 240 
No. of unit (produced). 100 80 


From the above table, sale prices (per unit) of factories A and B 
are respectively Rs. 5 (500+100) and Rs. 3 (240+80). The widths 
of the two rectangles would be 5:8, and the length would be 
100: 80. The rectangles would indicate the total sale-proceeds 


within which divisions would be done for representating items of costs 
and profit. 


: Tn the first rectangle the items profit, wages and materials would 
be in the ratio of 150 : 250 : 100 i.e, 3:5:9 and the vertical scale” 


is divided in such ratios. Similar treatment would be for the second 
rectangle also, 
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Cost of Production, Sale-proceeds and Profits of a Commodity 
in Factories A and B 


Factory A 


Fig. 24 


(2) Square diagrams. If it is required to compare quantities 
in the ratio of 1 : 25, then bar-diagrams become unsuitable, since the 
height of one bar should be 25 times greater than the other. One bar 
will be too small, and the other too tall. In such cases, square diagrams 
give better result. 


Now the side of a square varies as the square root of its area. So 
for representating two figures 100 and 2500 by squares, the sides 
should be in the ratio of ./100:./2500 i.e. 10:50 and not in the 
ratio of 100 : 2500. 


The method is simple. At first the square roots of the given 
values are taken. The. sides of the squares are made in proportion to 
these square roots, The squares of the sides represent the data, 
Squares should be placed on a common base and the diagram must be 
accompanied to the scale of construction. é 
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Example. 
Main heading of income 1948-49 
of the Central Govt. (in Lakhs of Rs.) 
Income tax 13,998 
Import duty 7,274 
Production duty 5,063 


Other taxes 319 


It is required to represent the figures by square-diagrams. 


Calculation. 
Square | Side of square (ems) 
icine roots Col. (8)+ 50 
1 B} 4 
Tnoome tax 1183 2°366 
Import duty 85°29 1°706 
Production duty 7117 1°423 


Other taxes 17°86 “B57 


Income of the Oentral Govt. (1948-49) 
Scale : 1 sq. cm) =Rs. 898'5 lakhs 


| 
INCOME TAX IMPORT DUTY _PROD.DUTY OTHER TAXES 
Fig. 25 


Calculation of Scale. The area of any square is calculated 
first. Now the area of a square representing other taxes is 
"857 x 857 sq. cms. This area represents an income of Rs. 319 lakhs. 
So 18q. cm. would represent Rs. 893'5 lakhs. 

(8) Circles. Circular diagrams are used almost in all those 
cases, where square diagrams are used. The reason is that area ofa 
circle varies as the square of its radius, 80 also the area of a square 
varies with the square of its sides. In construction of circles, like 
squares the square roots of the various figures are to be calculated and 
the radii of the circles are kept in the ratio of these square root values. 
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Example. 


The example of previous page (used in squares) is taken, for the 
construction of circles. Only the Col. 4 is to be read as ‘radius of 
circle’, instead of ‘side of square’. 


Calculation of Scale. The area of the last circle representing 
the income from ‘other taxes’ is about “415 sq. cm. would represent 
Rs. 319 lakhs, then 1 sq. cm.=Rs. 768°67 lakhs. 


. 


Income of the Central Govt. (1948-49) 


Scale: 1 sq. cm.=Rs, 768'67 lakhs. 
©- 
INCOME TAX 
m 
\ . 


17001 
r42cm 
Oo” 
OTHER TAXES 
PRODUCTION DUTY 
{MPORT DUTY 


Fig. 26 


Circular diagram (or Pie diagram). It is a pictorial diagram in 
the form of circles where whole area represents the aggregate and 
different sectors of the circle, when divided into several parts, represent 
the different components. 


For drawing « circular diagram, different components are first 
expressed as percentage of the whole. Now since 100% of the centre 
of a circle is 360°, 1% corresponds to 3°6 degrees. If » be the percen- 
tage of a certain component to the aggregate, then (px 3°6) degrees 
will be the angle, which the corresponding sector subtends at the 
centre. 


Example. 


The expenditure during Second Five-year Plan in West Bengal is 
‘shown at the next page— 


Bus. Stat.—5 
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(Rs. in Crores) 


On Industries 127°00 
» Irrigation 92°50 
» Agriculture 100°00 
» Transports & Roads 92°60 
» Miscellaneous 68°00 

480'00 


—‘To represent the data by circular diagram. 


First we express each item as percentage of the aggregate. 


so¢ = 12700 ay 
Industries 80°00 x 100=26'4. 


~ Trrigation =19'3 
Agriculture =20°8 
Transports & Roads =19'3 
Miscellaneous =14'2 


Now 1% corresponds to 3°6 degrees. So the angles at the centre 
of the corresponding sectors are (in degrees, 


Industries = 26°4 x 3°6 =95'0 
Trrigation =19°3 * 3°6 =69'5 
Agriculture =20°8x3°6 =74'9 
Transp. & Rds. = 19°3 x 3°6 = 69°65 
Miscellaneous = 142 x 3°6 =51°1 


Now with the help of compass and protector (or diagonal scale) 
the diagram is drawn. 


Eependiture during Second Five-year Plan in West Bengal 
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s 
Note. Additions of all percentages of the items should be equal to 100 
and. also the addition of all the angles should be equal to 360° (app.). 


If two aggregates with their components are to be compared, then 
two circles are required to be drawn haying areas proportionate to the 
ratio of the two aggregates. 


Example. The production cost of two manufacturers A and B, 


Patticulars Manufacturer A Manufacturer B 
(Rs. in thousand) (Rs. in thousand) 
Material 27°7 62°2 
Wages 377 605 
Expenses 16°4 32°4 
Fact. overhead 18°2 49°9 
Total 100'0 188°0 


The radii of two circles should be in the ratio of ./100 and /188 
i.¢., 10: 18°71 i.e., 10 : 14 (app.) de, 5: 7. 


Calculation. 

‘ Manufac. A} Manufac. B 
Particulars Rs. (000) | degrees | Rs. ('000) | degrees 
Material QT 7 52°92 100 
Wages 877 605 116 
Expenses 8274 Of 
Fact. overhead 42°9 82 


Total 188 


68 BUSINESS STATISTICS 
e 
Three Dimensional Diagrams. 


Cubes. In calculating the volume of a cube, three dimensions 
—length, breadth and depth are to be counted. Hence eube is a three 
dimensional diagram, and is also known as volwme diagram, whereas 
the two dimensional diagrams discussed previously, are known as swrface 
diagrams, 


Example. 


The following table gives the amount deposited in a new branch 
office of a certain bank for the first four months, 


Month (Rs. in thousand) 
1 12 
2 70 
3 150 
4 ; 270 


For representing the above figures in cubes. We are to make 
the cubic roots of the figures, the sides of the cubes should be in 
proportion to the ratios of cubic roots (if necessary, may be divided 
by a common factor). 


Calculation. 


(Rs. in 000) Cubic roots 


Side of cubes (cm.) 


Col. (3)+3 
(1) (2) (4) 
ai 12 “16 
a 70 1'37 
3 150 177 
4 170 2°16 


Deposits in 4 months 


oom a 


Fig. 29 
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Note. Cylinders, spheres are also known as three dimensional 
diagrams. These diagrams are not discussed here, since difficult 
calculations are required for such constructions. 


Pictograms. 


The pictures (i.e. pictograms) are used yery popularly for repre- 
senting data. The method is quite effective and has the advantage of 
being easily understood by 2 common man. Hach symbol of picture 
represents a definite numerical value. If a fraction of the numerical 
value, represented by a symbol occurs, then the proportionate part 
of the picture from the left-hand is drawn. 


Example. 
The table below shows the number of Primary Schools in a 
certain district for the years 1950, 1960, and 1970. 
Years | 1950 | 1960] 1970 


No. of Primary 20 90 230 
Schools | 


Number of Primary Schools in 1950, 60, 70 
One figure represents 10 Schools 


oo 1 fit 
oo HOD h Oo 


om PDO Ooo oo 
HHH Hoo ooo 
iit a 


Fig. 30 
Cartograms. 


Cartograms or statistical maps are used to represent quantitative 
data ona geographical basis.. The quantities on the map are shown 
through shades, colours, or by pictograms. The maps should be 
used only where geographical comparisons are of primary importance. 
The drawing is not shown here, since maps are not common in use. 
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EXERCISE 3 

1. Explain clearly between natural scale and the logarithmic 
scale used in graphical presentation of data, [1.0,W.A. Jan. 1971] 

2. Write short notes on (a) Historigram (b) Ogive (c) Fre- 
quency Curve (d) Frequency Polygon. 

8. Represent graphically the following data, relating to cheque 
clearance : 

Cheque clearance ( Crores of Rs. ) 


Month Jan. Feb. Mar. Apr. May Jun. July Aug. Sep. Oct. Nov. Dec. 


Year 

1958 832 765 873 792 791 663 884 754 806 799 773. 887 

1959: 894) 695)..9465 9461, 8490 
[O. U. M. Com. 1961] 


4, Represent the following information graphically and also 
draw a graph on the same sheet to show the balance of trade. 


Indian Export and Import in millions of rupees 


Period Import Export Period Import Export 
1946 April 217 213 1947 Jan, 825 364 
May 218 304 Feb. 320 255 
June 205 954 ‘March 336 307 
July 263 238 April 360 258 
Aug. 227 211 May 409 362 
Sept. 289 200 June 885 854 
Oct. 299 259 July 436 286 
Nov. 313 253 
Dec. 825 330 (B.Com. Madras] 
5. Represent the following data about a country by a suitable 
graph ; ! 
Production in million tonnes 
Year Rice Wheat Pulses Other cereals 
1962 80°4 10 8 16 
68 32 11 10 18 
64 33 85 11'5 20 
65 85 12 ll 93 
66 86°5 10 10 21 


67 38 11 9 94 


1 
| 
| 


| 
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6. The following table shows the foreign trade of Japan. 
Represent the figures by suitable graph. 


Foreign Trade (value ; million dollars) 


Year Exports |. Imports | Excess of Imports 


1940 857 809 48 
103 806 —202 


Se 
(Source : Ministry of Finance, Japan ) 


; 7. ‘What is false base line ? Under what circumstances should 
it be used ? The following data gives the index number of industrial 
profits in India. Represent it graphically. 


Year Index Number Year Index Number 
(1929 = 100) } (1929 = 100) 
1941 187 1946 229 
42 222 47 192 
43 246 48 260 
44 239 49 182 
45 234 50 247 


8. Explain what is a semi-logarithmic graph. What purpose is 
served by such graphs and what are its uses ? [I.C. W. A. Jan. 1969 ] 


9. From the following table, draw a ratio chart on a graph 

paper : 
Year 1937 "88°39: «7°40 '41—«*42 438 "4A 4B 
Units Produced 2 4 8 16 82 64 128 256 512 
[ 0. U. M. Com. 1956 J 
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10, Represent graphically from the following data the growth 
of the population of a state in India to show both relative growth and 
growth by absolute amount : 

Oensus year 1871 ‘81 '91 1901 "11 ‘91 ‘31 ‘41 

Population 

(in lakhs) 50°0 52°4 55°6 59°5 65°0 68°7 73'1 77°5 


11. The profits of a particular firm and of the whole industry 
are given in the following table : 


Year 1950 ‘51 ‘52 53 54 55 'BG 
Firm 
(in Rs. 10,000) 150 4200 218 269 850 6°00 14°00 
Industry 


(in Rs, 10,00,000) 3°20 400 5°40 6'80 8'00 11°00 19°00 
Compare trends of profit by semi-logarithmic graphs and com- 

ment on the performance of the firm in relation to that of the 

industry. [I. 0, W. A. Jan. 1969 ], 
12. The following table shows the values of a variable y 


corresponding to some given equidistant values of the independent 
variable « : : 


Cor eae a Kigh tdsshatahi: hae 23sec iahacy O 
Parc 132 214 330 486 688 949 
Draw a semi-logarithmic chart and find by graphical interpolation 
the value of y, when «=10'°5 (I. C0. W. A. Jan. 1971 ] 


18. (2) What is meant by hist 2 ti i 
from the following :— es ison 


120-180 


60 50 


180-140 


class limit | 90-100 | 100-110 | ssa0 


frequency 16 22 45 


(b) Calculate the number of cases between 112 and 134 
(c) Number less than 119 
(4) Number greater than 134 


140-150 | 150-160 


———__—| 


[C. A. May 1969 ] 


The population of six States in India are given in the 


14, 
following table. Represent the data by bar-di 


iegram. 
States Population 
Uttar Pradesh 88341144 


Bihar 56353369 
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States Population 
Maharashtra 50412235 
West Bengal 44312011 
Andhra Pradesh 43502708 
Tamil Nadu 41199168 


[ Source : Census 1971] 
15. Represent the following data by suitable diagram : 


Educated ( graduate and post-graduate ) unemployed in India 


1969 1970 1972 (as on 80. 6. 72 ) 


Number} 186,436 | 232,250 | 333,491 463,519 
ease NE Settee 9 Gl el Eee PA; PRR eR aeaoamiad dh ibid Lat ice tally le 
[Source : Register Employment Exchange ] 


16. Describe the advantages of diagrammatic representation of 
statistical data. Name the different types of diagrams commonly 
used, and mention the situations where the use of each type of 
diagram would be appropriate. [I. 0. W. A. June 1975 ] 


17. Represent the following data by a suitable diagram showing 
the difference between proceeds and costs : 


Proceeds and Costs of a Firm ( in thousands of Rupees ) 


Year Total Proceeds Total Costs 
1950 22°0 19°5 

51 273 217 

52 28°2 30°0 

58 30'3 25°6 

54 32°7 26°1 

55 33°38 34°2 


[I. C. W. A. Jan. 1966 ] 


18. The following table gives the average approximate yield of 
rice in lbs. per acre in various countries of the world in 1938-39. 


Country India Siam U.S.A. Italy Egypt Japon 
Yield in Ibs. 728 948 «1,469 «3,908 «3,158,976 
per acre. 


Indicate this by a suitable diagram which will highlight the rela- 
tive backwardness of India in this regard. [I. C. W. A. Jan. 1964 ] 
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19. Represent the following table by sub-divided bars drawn 
on a percentage basis : 


Cost proceeds and profit or loss per table 


Particulars 1951 1956 

Cost per table— Rs. Rs. 
(a) “Wages oo 21 9 
(b) Other costs ++ 14 6 

(ce). Polishing oo mii 3 

Total costs 42 18 
Proceeds per table -- ** 40 20 
Profit (+) or loss (—) per table (—) 2 (+)2 

See Ries PT a EE SS fe 
[ B. Com. Allahabad ] 


20. Represent the information contained in the following table 
in a component bar-diagram : 


Commodity Patiern of India’s Exports ( Percentage ) 


1956-57 1957-58 1958-59 
Capital goods 0°29 0°31 0°30 
Intermediate goods 45°82 46°87 44°19 
Consumer goods 50°50 47°32 48°19 
Unclassified 3°39 5°50. 732 
Total 100°00 100°00 100°00 


[ C. U. B, Com. (Hons.) 1967 ] 
21. The following table shows the number of bushels of wheat 
and corn produced in a farm during the years 1950 to 1960. 


Express the yearly number of bushels of wheat and corn as 
percentages of total annual production Graph the percentages by 
component bar-diagrams. 

Year 1950 51 52 58 54 65 . 56 57 
No. of 

bushels of 

Wheat 200 185 225 950 940 195 210 295 950 930 935 
No. of 

bushels of | 


Corn 75 90 100 85 80 100 110 105: 95 110 100 
(Dig. Soc. Welfare, May 1968) 


68 459 «60 
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22. The following table shows the monthly expenditure of two 
families. Represent the data by means of two-dimensions rectangular 
diagrams, 


liem of Expenditure Family 4 Family B 
Income Rs. 600 p.m. Income Rs. 1000 p.m. 
Food 200 360 
Clothing 80 120 
House Rent 120 200 
Education 100 120 
Miscellaneous 50 100 


(Hints; Savings will be Rs. 50 & Rs. 100 respectively for the 
two families, 


28. Draw a suitable diagram to represent the following 
information : 


Factory Wages Materials Profits Units produced 


(Rs.) (Rs.) (Rs.) 
A 2,000 3,000 1,000 1,000 
B 2,200 2,400 1,000 800 


—Show also the cost and profit per unit [B.Oom, Allahabad ] 
24. Represent by square diagrams the following data : 
Educated (under graduate) unemployed persons in India 


Year 1969 1970 | 1971 1972 (as on 80. 6. 72) 


No. of 


persons 356 395 529 663 
in 000 


[ Source : Employment Exchange ] 


25. Construct a pie-diagram for the following data : 
Principal Exporting Countries of Cotton 
(1,000 bales )—1955-56 
U.S. A, India Egypt Brazil Argentina 
6,367 2,999 1,688 650 202 * 
[C. U. M, Com. 1959] 
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26. O€ the Life Insurance policy dividends paid in the United 
States 21% were taken in cash, 31% were used to pay premiums, 18% 
were used to purchase additional paid-up Life Insurance, 30% were 
left with Life Insurance companies to earn interest. Construct a pie- 
diagram showing these different uses of policy dividends. 


(G. U. M. Com. 1962 ] 


27. The following data represent the share of important pro- 
ducing states in the total area and production of Tobacco in India, 
during the year 1957-58. 


Draw two pie-diagrams to represent the informations : 


Percentage 
States | 


Area Production 


Andhra 39°1 
Bombay 25°5 
Mysore 111 
West Bengal 4°5 


[Source : Tobacco India, 1957-58] 
28. Represent the following figures by enbes : 
Number of Students during 1971-72, in India 
Primary Schools All Schools Universities 
(in Lakhs ) 686 840 24 
' 
{ Source : Census 1971 ] 
29. 


Year 1863 1864 1865 1866 1867 
No. of Pact. in 


European Russia 11800 12000 13700 6900 7100 
~Present the above data in a graph. [1 0. W. A. June 1979 ] 


er 
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30, The table below shows the exports of woven-piece goods 
in Million Square Yards during some months in a year : 


Cotion Wool 
April 96 15 
May 78 10 
June 72 9 
July 65 10 
August 77 10 


Make a graphical comparison of the compound bar-chart of 
the volume of exports given in the above table. 
[I C. W. A. June 1979 ] 


31. Draw histogram and frequency polygon to present the 
following data : 


Income (Bs.) No. of Individuals 
100-149 21 
150-199 32 
200 - 249 h 52 
250-299 105 
800 — 349 62 
850 — 399 43 
400-449 18 
450-499 9 

342 


[1.0. W. A. June 1978 ] 


32. Discuss the types of data which are usually represented by 
Pie-diagrams. State how they are drawn. 


Represent the following data by a bar-diagram i 
Production of Sugar in a certain year 
in quintals 


(000, 000) 

Cuba 32 
Australia 80 
India 20 
Japan 5 
Java 1 
Egypt » Hk 

89 


[ 1.0. W. A, June 1978 J 
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33. Draw a pie-diagram to represent the following population 
in a town: 
Males Females Girls Boys Total 
2,000 1,800 4,200 2,000 10,000 
[I. OW. A. Dec. 1977 ] 
34. Draw the graph of the following : 
Year 1990 "21 ‘22. '98 ‘24 '956. "96 ‘27 
yield ‘ 
(in million tons) 12°8 18°9 12°8 13°9 184 65 2'°9 14'8 
{1.0. W. A. Dec. 1977] 
35. Explain the use of various diagrams in presenting statistical 
data. {1.0. W.A. June 1976 ] 


36. The following data show the estimated savings of the 
household sector in India during 1662-63, as revealed by the C.9.0. 


Form of Savings Amount (Rs. crores) 
Currency 175 
Provident Fund 145 
Physical 158 
Others 440 


Present the information in a suitable diagram so as to enable 
comparison among the various components and also in relation to the 


total. {0. U. B. Com (Hons.) 1980 ] 


a. ‘ - slip ars fons gopparmants labelled 1,2, 3 and 4, The 
Space limits of 1, 2, 3 and 4 are respectively 180,000. oubic feet, 
160,000 cubic feet, 140,000 cubic feet and 120,000 cubic Sy Present 
the data about the different space limits in a table and draw a Pie 
diagram to represent the above data. (I. 0. W. A. June 1980 ] 


INTERPOLATION 


Introduction 


The term interpolation means framing the most appropriate 
estimate of a missing quantity, under certain reasonable estimate, 


In the Chapter of Average (discussed later on), the median and 
mode were interpolated in the median and modal classes respectively. 
This, of course, was done by proceeding with certain assumptions. 
api let us take the following inter-related values of two variables 
wand y. 


_ Now for =8, y=12'0 and for e=4, y=13'6, but we do not 
know tho value of y variable for 2 is 3°6. The technique of estimating 
the value of y for x is 3°6, would be called interpolation. Again the 
technique of estimating » past figure is termed as interpolation, while 
that of estimating a probable figure for the future is called eatra- 
polations. 


Assumptions. 


We cannot supply the missing figure just arbitrarily, but we are 
to make the most appropriate estimate. The making of this appropriate 
requires certain assumptions, which are as follows— 


1. There are no sudden jumps from one period to another i.e, 
distribution should be normal. If, however, there are violent 
disturbances, the estimation of interpolation will be impossible. — 


9. ‘The rate of change of figures from one period to another is 
uniform. { 9 10 
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Finite Difference 


In problem of interpolation, the independent variable is known 
a8 argument, while the dependent variable is usually called a function 


(or entry) of the former. Let @o, 1, Bq, ...... , ©, are Successive values of 
argument having a constant increment, and yo, 43, Uieicasess +¥n are the 
corresponding functions, then (yi—yo), (ye—Y1), «+... » (Yn Yn-3) are 


called finite differences of first order or (simply first differences). These 
differences are denoted respectively by Ayo, Ayu, ...... » Aun-a i. 


AYo=¥1-Yo, Agi =¥2-41, Sahn ee ’ Yn-1=Yn—Yn-1 proceeding 
similarly with the first differences, we have (Ay,— Ayo), (Ava 
= AUa)y cytes » (Atn-1- Ana) which are known as finite differences of 


second order (or second differences), denoted respectively by 


A*yo, Deg TE Aine a, i.e. 
Alvo =Ay- Avo, Ay: =Ays- Ay, tweens 
Un-2 = Ayn-a- Yn-2 


Tn the same way, third differences A*y, fourth differences A *y 
etc., may be calculated. 


A numerical example will make it clear. 


3 
1 
6 10 0 
4 1 
8 14. 1 
5 1 
10 19 2 
7 


12 26 
ia. nO 
In the above table, 7 is th i 
leading differences, f © leading term and 3, 1, 0, 1 are the 


The Symbolic Operator E 
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The operation HE. may be repeated and we write, 
E® f(e)=E(E f(x) =E (f@+h)) =fle + 2h) 
E® f(a)=ELE (E f(w))] =E [fe + 2h)] =f + 8h) 
Similarly, EB” f(x) =/(w + nh) 
Now Afle) =e +h) - fla)=B fla)- fle) =(B- 1) fz) 


or, A=E-1 or, EH=1+A, which means the operation by 
E is equivalent to the operation by 1+ A. 


We know, A*yo = Ay: — Ayo =(va—¥1)— (1 -Yo) =Ya- 241+ Yo 
Similarly A°yo=ys— 3y2+38y1—Yo f 
A*yo =ya- 4s + bya — 441 + Yo and 80 on, 
Now we can write, 
Ayo=Eyo-Yo . 
A*yo=E* yo — 2BVo + Yo 
A*yo = B® yo — 3E*Yo + 8EYo— Yo 
Ayo =Btyo — 4B Yo + GE Yo — 4EYo + Yo 
Again, taking the operators only (i.¢., removing Yo), 
A=E-1, A*=E*-2E+1=(E-1)? 
A*=E*-38E?+3E-1=(E-1)* and so on. 
In general, A’=(H-1)" 
Example. 
Express A‘*yo in terms of yo, V1, Yarrre- 
A*yo=(E-1)*¥0 =(E* — 48° + 6E* — 48 +1)yo 
~ = Ety)— 48° yo + 6H*yo 4B yo t+ Vo 
=y4—4y3 t+ bye 4y1 t+ Yo. 
Differences of Polynomial Function. 


It yisa polynomial of nth degree, then the consequence differences of higher 
degree are all zero. 


Example. 


_ Find out with the help of E the value of the production for the 
year 1964 from the table at the next page : 


Bus. Stat.—6 
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Year Production We may write the table as 
(000 tons) follows— 
1961 15 1961 Yo 15 
62 18 62 vi 18 
Skee 20 63 Yo, 20 
64 64 Ys 
65 23 65 Ys 28 


Since here only 4 values are known, it will be a polynomial of 
8rd degree, and hence the consequence differences of 4th degree will 
be zero i.e, A*yo=(E-1)*¥0=0, 

or, B*yo— 4B yo + 6E7Yo — 4EVo + Yo =0 

or, Ya- 43+ 6¥a—4yi t+y0=0 

or, 23—4y,+6.20-4.18+15=0 

or, Ys=21'5. 


The required estimated production in 1964 will be 21'5 
(in '000 tons). 
Methods of Interpolation. 


There are two types of methods of interpolations. They are— 
1. Graphic method. 


2. Algebraic method. 


GRAPHIC METHOD 


This is the simplest of all the methods of interpolation. The 
Statistical data are to be plotted on a graph paper. After this a 
continuous smoothed curve can be obtained by joining the plotted 
points. On the X-axis we take the period and on Y-axis the corres- 
ponding variable. For the period for which the variable is to be 
calculated & perpendicular is drawn from the same period (in X-axis) 
meeting the smoothed curve. From the point where it meets the 
curve, another perpendicular is drawn on the Y-axis, The point on 


Y-axis is read off, which is the required value. The idea will be clear 
from the example. f 
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Example. From the following data determine the population (of 
a certain city)in 1946 and find also the increase of populations 
between 1946 and 1936, 


Year Population 
(lakhs) 
1931 34 
41 39 
51 45 
61 49 
71 52 


Fig. 31 


The graph is drawn by usual process. Now from the graph it is 
clear that the population of 1946 was 42 lakhs, that for 1936 was 365 
lakhs and the increase of population was 5°5 lakhs (=42—36'5). 


Note. Although it is a simple method, but not an accurate method. 
Narrower scale is to be taken on the graph for longer volume of figures 
and consequently the greater will be the error of approximation. 


ALGEBRAIC METHOD 


Under this method, there are several formulae some of which are 
given below : 

1. Newton's Formula 

2. Lagrange’s Formula 

8. Method of Binomial Expansion 
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(1) Newton’s Formula 


Newton’s Forward Formula. This formula is suitable when 
the figure to be interpolated is in the beginning ofi the table and the 
values of the argument are equidistant. In the formula, we take only 
the leading differences into account, and these differences are always 
in the beginning. Newton's Forward Formula is expressed as follows—_ 


2(@—1)(@— 2) 


Vato + ao? + 22—D p24 1x2x3 Ao*+ 
we-1Ne-2e-3), «0... 
Ixaxgxg So * 


where, 4 is the figure to be interpolated, Yo is the value of origin, 
A’s are the differences between adjoining values. 
is calculated as follows— 


figure of interpolation — figure of origin 
distance between adjoining figures 


Example. The followings are the annual i ioy 
premiums for a policy 
of Rs. 1,000. Oaleulate the premium at the age of 82. 


weenie 
Age (in yee.) | 90 | 95 | 30 | 35 | 40 
Poi 64 ae 


Promium (Rs)| 24 | a7 | $1 | 36 | 495 i 


Age (yrs.) | Premium Differences | 


difference between adjoining values of ¢ 
= 32-20 _ 12 =9 
5 5 


pa Voor of interpolation — year of origin : 
' 
{ 
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Now putting the above respective values in the formula of Newton 
we find, 


id ‘ 2'4(2'4 —1) (2°4)(2'4 — 1)(2"4 — 2) 
Yso=244+QR4X38+ 1x2 x1+ 1x2x3 x0 


4. (2'4)(2'4 — 1)(9'4 = 9)(0'4 = 3) x (8) 
1XQx3x4 


=24+7'2+1°68 + 0—00168 = 32°87832 = 32'88 (app.) 
the reqd. premium = Rs. 32°88 


Example. 


The following table depicts the number of persons earning 
certain grades of wages. Estimate the number of persons earning 


between Rs. 60 and Rs. 70 per mensem ? 


Wages per mensem Number of persons earning 
(Rs.) (thousands) 
below 40 250 
40—60 120 
60—80 100 
80—100 70 
100—120 50 


{ M. A. (Agra) 1951] 


At first we are to estimate the number of persons earning below 
Rs. 70. From this estimated number we are to deduct the number 
earning below Rs. 60, for finding required estimated number. 


a 


Wages (Bs.) ae Differences 
(cum. fr.) | A? A? Ae 
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70-40 _ 30 


eae a 15 


20 20 
Putting the above respective values in the Newton’s formula we get, 


‘5(1'5 — ‘5(1'5- 1)(1'5 - 2) 
ro 260+1'5% 120+ TSA P=) (— 99) + VBUB— INL - 9), (_ 4 


3 
1°6(1°5 - 115 -— 915 - 8) 
f 1x9x3x4 «a 


= 250 +180 — 7°5 + 625 + 46875 = 493'59375 
the reqd. number of earners between Rs. 60 and Rs. 70, 


= 423°59375 — 370 = 53'59375 thousands 
= 53594. 


Newton’s Backward Formula. This formula is suitable 
when the figure to be interpolated is lying near the end of the 
tabulated values, and the values of the argument are equidistant. 
Newton's Backward Formula is expressed as follows— 

j " +1) (e+ 

tenia Dt ETD Ary. +O NEED Ay 


pliseeacets 4 @(e+1) (e+9)...(o+n-1) 
n 


Ao 
where, yz is the figure to be interpolated, 
Yo is the value of the origin, 
A's are the differences between adjoining values, 
@ is calculated as follows— 
figure of interpolation — figure of last entry 
distance between adjoining figures 


This is called Backward Formula since starting from Un it uses 
values of y backward and none forward. The formula uses differences 3 
at the bottom of the different columns. 


‘Example, 


Find out the value of Y — 
following table : ee 
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Let us construct the differences table : 


Differences 


Here, «= 18:20 = 


Bree 
Now, putting the above values in the backward formula we find, 
vas =18+(-"4) x4 4FN_ gy 
="A(="4 41-449), _ 
* 1.2.3 (=) 


=18-"4+°36+'064=18'024 


(2) Lagrange’s Formula. 
This formula is applicable for the series of unequal intervals. 
The formula is expressed as follows— 
ae (a — @4)(a— ao)...(a— an) ry (a — @o)a—ag)...(e—- an) 
(wo — @1)ato —@q).-(@o —@n) ~ * (1 — wos — 2a). (@1 — An) 
ook Dae (e=29he=23)..(0~ tas) 
3 + U9, — wot ~as)...(te =tn'-a) 
where, yz is the quantity to be interpolated, } 
@ is the quantity for which the value of y is to be found. 
&o, Ti, La, Lg,... are the variables of x-series, 
Yor V1s Ya, Ys;... are the variables of y-series. 


Y 
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Example. 
Determine the percentage of criminals under 85 years of age. 
Age Percentage of criminals 
Under 25 yrs. 52°0 
WH WOOL 3 67'3 
vee OV 84'1 
jr) 50) |,5 94°4 [ B.Com. Nagpur 1963 ] 


% of criminals 
Zo 52°0 
Cy 673 
qs 841 
94°4 


Under 25 yrs. 
» 80.5 
» 40, 

50 


From the Lagrange’s formula, 


(w—asNa-mele-as) , (w-ao)(e—ae)e— as) 


y 


ete (@o - @1)(ato — tao — 2s) 3 (a — wo) —wa)(e1— 5) 


(a — oa — 21)(a— a5) (a-ao\(a- s)(a 5) 
U9. wate 2st as)" a Baia: = OF —@s 


Putting the respective values in the formula, we find, 


= 5 9,(85 = 80)(35 - 40)(85-50) |... (35 —25)(35 — 40)(35 — 50) 
tse ~ 52.55 —g0\a5 — 40)(95 — 60) * 678 (39 = 95304030 = 50) 


1 (85 = 2535 - 8035-50)... (35 — 95)(35 ~30)(35 — 40) 
+.84°1 (90 — 25135 — 80)(35— 50) | 4 ,., (85 951(35 = 30\(35 — 40) 
* @0=95X40 = 3040-60) * 944 (59 9860 = 30\50 = 40) 


m5 S-(-8K=15)_ 9g 10.(—5)-26)) "008:(= 16) 


(= 8 = 15" — 96) * 7°85, (=30(=90)* 41 is, 30; (= 10) 


10, 5. (-5) 


+944 95-90. 10 


= = 104+ 50475 + 49°05 — 4°79 = 77°405% 
| 
the reqd. No, of criminals under 35 yrs. = 77°41% (app.) | 
: 


Example, 


Given, log 654 = 2°8156, log 


658 =9'8189, =o 
=2'8202, —-find log 656 (aill’are Seegean ne 8 Ol 


in common logarithms). | 
[LA.S. 1956 ] 
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© y 
log 654. ao 2°8156. Yo 
log 658 ay 2°8182 yx 
log 659 we 98189 ye 
log 661 25 9°8202 Ys 


Using Lagrange’s Formula, and substituting the values, we get, 
919156 (658 — 658)(656 — 6591656 - 661) 
Heat (654 — 658)(654 — 659)(654 — 661) 
o1 aq (656 — 654)(656 — 659)(656 — 661) 
+2'8182 (55 — 654)(658 — 659)(658 — 661) 
(656 — 654)(656 — 658)(656 — 661) 
+2°8189 (659 — 654)(659 — 658)(659 — 661) 
= (656 — 654)(656 — 658)(656 — 659) 
+ 2'8202 (a7 — 654)(661 — 658\(661 — 659) 


orang (2 8X =5) 4 9:91.99 (2X = 3X =5) 
98156 (ay — py = 1) + 28182 aK 1K=8) 
(2-2) —5) (2-2-3) 


+2°8189 (a —2) + 2'8209 (T)8)(2) 
= 6033 + 7°0455 — 5°6378 + 8059 / 
= 28169 

the required estimated value of log 656 = 2'8169. 


Example. 


The following table gives the normal weight of » baby during the 
first six months of life : 


Age in months O° Be Br B76 
Weight in lb. 6 7-8 10 12 


Estimate the weight of a baby at the age of 4 months. 
[LO.W.A. Jan. 1970] 


Age in months Weight in 1b. 
0 &o 5 Yo 
2 wy, T U4 
3 we : 8 us 
5) as 10 vs 
6 we 12 v4 
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(4-9)(4-3)(4—5)(4—6), , (4—0)(4—-8)(4—5)(4-6) 
(0-2)(0- 8(0-5)(0-6) * ° (@-0K2=3)(2—5)a—6) 
+g (4=0)4- 9454-6) | 19 (4-0)4~ 94 ~8\4-6) 

(3 - 03 —2)(3 = 53-6) "~~ (6-0)(5— 25-3) —6) 
(4-0)(4 = 9)(4 - 8)(4— 6) 
(6 —0)(6 —2)(6 — 36-5) 
= _0XIX=-1f 2), 7 AMM IX= 2) 5 (4X —1- 9) 

(-2)(- 8-5-6) * © (@(-1-3)-4) LNT Re ’ 

(421-2), 4, (4X2)1)(- 1) 

+10 (5(3y(9=1) 2 (exayaya) 
1221+ 64+48-12_80_ 98 
9 Rie x9 


ya 


+12 


1_ 7,64, 16 
Ro Oe 


- 4 = 
8 
the required estimated weight of a baby is 8g Ib. 


(3) Method of Binomial Expansion. 


This method requires some calculations, subject to the satisfac- 
tion of the following two points : 


(1) The variable @ should increase uniformly, say, 2, 4, 6, 8, ... 
etc. The method is not applicable for otherwise. 


(2) The value of for which y is to be interpolated should be one 
of the same class-limits of w-series. For example, 


We can find the value of y for a=6, but not for #=7 or 9. Again 
we can find the value of y for ¢=12 but not for @=11 or 13. 


Expansion of Binomial, and equating it to zero, we find, 


n(n-1) n-2_ nln—-1)(n—2) 
axa? 


ase eo = -1 
(y-1)"=9" = ny""* + Ey 


y"-8 + + =, 
where is the number of known values of y. 
forn=3 Yg-3y¥9+3y1-Yyo=0 
m=4 4-44 + bye -4y1 +40 =0 
m=5 ys — Sys +10ys—-10ys + 5y1-yo=0 
m=6 Yo—6ys +15ys—20ys —15y2—6y1 + Yo =0. 
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Example. 


The following table gives the amount of cement in thousands of 
tonnes manufactured in the year z. Find the missing term. 


1956 ‘58 | eo | 62 | '64 66 


a 


Cement manf. 


thousands tonnes 264 388 


ag | 86 | 2 | 161 


[LO.W.A. July 1966] 


Here, the known values are 5 the fifth leading difference will 
be 0. Symbolically, it is expressed as 


A* =0 of which the binomial expansion is expressed as 
Ys — 5y4+10ys— 10y2 + 5y1 —Yo=0 


Now, 

x y 

1956 39. Yo 
58 85 ya 
60 2? Ya 
62 151 4s 
64 264 y4 
"66 388 5 


Substituting the values, we find, 
388 —5 x 264 +10 x 151—10y2+5 x 85-39=0 
or, 888-—1320+1510—10y_ + 425-89 =0 or, 2 =96'4 
the probable amount is 96°4 thousand tonnes. 


Example. 


The age of mothers and the average number of children born per 
mother are given ina table below. Interpolate the average number of 
children born per mother aged (830—34). 


Age of mother (in yrs.) Average no. of children born 


15—19 07 
20—24 21 
25—29 35 
30—34 2 

35—39 57 
40—44 58 


[M.Com. Agra 1968 ] 


92 BUSINESS STATISTIOS 


Here, yo =0°7, yx =2'1, yo =3'5, Ys =?, ys = 57, ys = 58, 


Since the known figures are five, the fifth leading differences will 
be zero. 


Now, A,*=0 
or, Ys — Sys + 10ys —10y2 + 5ys — yo = 0 
or, 58-5 x 5°7+10ys -10x 3'5+5x91-0'7=0 
(substituting values) 
or, 5'8— 28°5 + 10y3 -85+10°5-"7=0 
or, —47'9+10y,;=0 or, 10y3=47'9 or, ys = 4°79 
the probable average no. of children =4'79 or 5 (app.). 


EXERCISE 4 


1. If. represents the numbers living at age # in a life table, 
find as accurately as the data will permit, J, for values of 7=385, 
42 and 47 given leo = 512, Iso =489, lao =346, 159 = 243. : 


[I.A.S. 1948 ] ( Ans. 394, 326, 274) : 
2. Estimate by Newton’s method of interpolation, the expecta- 
tion of life at age 22 from the following data, stating the assumption { 
underlying the formula used by you— y 
Age 10. 15. 90 25 380 36 
Expectation of life 
(in yrs.) 35°4 322 291 96°0 931 90°74 
[LA.S. 1949, 1965 ].( Ans. 27°85 yrs. ) | 


3. From the following table, find the number of students who 
obtained less than 45 marks : 


Marks No. of Students Marks No. of Students 
30—40 31 60—70 35 
40—50 49 70—80 31 
50—60 51 


[Z.A.8. 1967 ; I.0.W.A, Jan. 1965] (Ans. 48 students ) 


4. The wages earned by workers per month in a certain 
factory are given at the next page. Calculate the number of workers 
earning more than Rs. 75 per month. 
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Monthly income No. of workers 
up to Rs. 50 50 ~ 
>» » » 60 150 

» » » 70 300 

» » » 80 500 

n » '» 90 700 

» » » 100 800 


[ B.Com, Nagpur 1963 ] ( Ans. 404 ) 


5. From the following data estimate the number of persons in 
the income group of Rs. 20 to Rs. 25. 


Income Number of persons 
below Rs. 10 20 
n ” 20 45 
” ” 30 115 
» ” 40 210 
” ” 50 325 
5 [ B.Com. Nagpur 1969 ] (Ans. 31 ) 


6, Find y for «=2 from the following table, 
0 i 3 4 5 
39 85 151 264 388 
[ L.O.W.A. Jan. 1969 ] ( Ans. 96°4 ) 


7. Estimate by suitable method of interpolation the number of 
persons whose income is Rs.19 but does not exceed Rs. 25 from 
the following data : 


Income in Rs. No. of persons 
1 and not exceeding 10 50 
10» » 7» 19 70 
nh: ee ae) » 28 203 
28 » 8 37 406 
387 2» » 46 304 


(M.A. Rajasthan, 1965] (Ans. 107) 


8. The following data show the monthly average number of 
deaths under one year in a certain large city. Find the missing term : 


Year 1960 1961 1962 1963 1964 
No, of deaths 
(monthly average) 940 ? 907 843 798 


[LC.W.A. Jan. 1972] ( Ans, 952 ) 
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9. The following values are given in a table : 


Using any suitable algebraic method, find the value of y for 7=3. 
[LA.S. 1953] ( Ans. 2, 38, 328 ) 


10. Mantises of four numbers are given below : 
Numbers 4200 4210 4290 4230 
Mantisse 62,382,493 62,42,821 62,538,125  62,63,404 
Find the mantisee of logarithm of 4,213. 
[LC.W.A. July 1968 ]' ( Ans. 62, 45, 915 ) 
11. Mention a formula which will help interpolation when 
observations are shown to be at unequal intérvals. 


The observed values of a function are respectively 168, 120, 72 
and 63 at the four positions 3, 7, 9 and 10 of the independent. varies. 
What is the last estimate you can give for the yalue of the function 
at the position 6 of the independent variable ? ( Ans. 147 ) 


12. The following figures relate to the number of estates liable 
to estate duty in a particular year : 
Class of estate Number liable 

Rs. 25,000 — Rs. 30,000... 638 

» 30,000— .» 40,000... 740 

» 40,000— » 60,000... 415 
Estimate the number between Rs. 31,000 and Rs. 32,000 by | 
interpolation. ( Ans. 85 ) 


18. Comment on the necessity and usefulness of interpolation. 
Describe the graphic method of interpolation. [I. 0, W..A. Jan. 1970] 


14. From the following table find the interpolated figure for the 
populations in 1946. 


1930 1940 | 1950 


years 1960 


Population 95,494 


of a town 29,008 


- 82,528 | 36,070 


[ 1.0.W.A. 1962] ( Ans. 81,116 ) 


NT 
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15... The population of a country in the decennial census was as 
under. Estimate the population for 1955 : 


years 1921 | 19381 | 1941 | 1951 | 1961 


Populations 
(in °000) 46 66 81 93 101 


[I.0.W.A. 1963] ( Ans, 99°56 thousands ) 


16. The followings are the amounts of income-tax paid by a few 
businessmen during one year : 


more|than Rs. 500 600 
» =» » 1,000 550 
»  » » 1,500 495 
»  » » 9,000 275 
no» » 9,500 100 
» » » 8,000 25 


Find out. the number of businessmen who paid more than 
Rs, 1,200 but not more than Rs. 2,400 as income-tax. 


[ 1.0.W.A, July 1965] (Ans. 369 ) 


17, The following table gives the normal weight of babies during 
the first twelve months of life : 


age (months) Din Bt: © Eb 


weight (lb.) 7 10 15 16°48 21 


Find the ‘weight of 2 7 months old baby. ( Ans. 15°66 Ibs. ) 


18. From the following table, by using Newton’s backward 
interpolation formula, find the value of y corresponding to w= 38. 


( Ans, 21°598 ) 


19. Applying Newton's backward formula, find the value of #=32 
from the table given below : 


( Ans. 84°9816 ) 
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20. Find out with the help of E the value of the production for 
the year 1974 from the following table : 


teas 1970 | 1971 | 1972 | 1973 


Roe fe ae pe 


(Ans. 28 in thousand tons ) 


91. Find f(x) given that /(0)=-3, f()=6, 4(2)=8, /(3)=12 
(State your assumption, if any). Hence find /(6). 

[I.C.W.A. June 1976] (Ans. 126 ) 

92. Below are given the values of a function Uz for certain 


values of @ : 

eho 1 p} 8 4 

Uei gyi 0.5 2% 57 
Construct the table of differences. What does this table suggest ? 
Use this table to find Us. [LO.W.A. Dec. 76] ( Ans. 116 ) 


23. State Lagrange’s interpolation formula. Use it to find the 
value of Us of a function Uz, given that U1=10, Us=15, U, =42. 

(1.0.W.A. Dec. 76] ( Ans, 31 ) 

24, Hstimate Us from the following table : 
yA | 2 3 4 
Us: 7 #3013 a1 

State the necessary assumptions made. 
[1.C.W.A. June 77] . (Ans. 9°5 ) 


25. The following table gives the expectation of life (e,°) at age x. 
Calculate expectation of life at age 12 by using Newton’s forward 
interpolation formula : 


w: 10 15 20° 95 30 35 
770. ang aga ood ORR aoe 
Ce 3 85°4 32°2 2971 26°0. 23°1 20°4 
[LO.W.A. Dec.’77] ( Ans. 84°174 ) 


26. Discuss the difference between Newton's forward and back- 
ward interpolation formule. Given, 


log10654=2'8156, logio658=2'8182, log. 0659 =2'8189, 

log10661= 28202. —Hind by Lagrange’s interpolation formula 
1og10656 (retain four decimal places in your answer). 

[1.0.W.A. June 78] ( Ans. 2'8168 ) 


5 
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27. Explain, what do you understand by the symbolic operator H. 
Given the following table, construct ® difference table and from it 
estimate y when 2=0°35 by using Newton’s backward interpolation 
formula, 

we: 0 0o1 02 03 04 
y: 1 17095 1179 1251 1°310 


(Answer to be given correct to 3 dec. places) 
[I. 0. W. A. June 78] ( Ans. 17282 ) 


28, Apply the appropriate interpolation formula to find log 3°146 
given log 3°141 = 0°4970679, log 3'142 =0°4972062, log 3°143 = 0°4978444, 
log 3°144 =0°4974825, log 3°145 =0°4976205 (Bind correct up to seven 
decimal places.) {I. 0. W. A. Dee. ’78] ( Ans, 0°4977584 ) 


29. ‘State Lagrange’s interpolation formula. The mode of a 
certain frequency curve y=/(z) is attained at 2=9'l and the value of 
the frequency function /(#) for 2=8'9, 90 and 9'3 are respectively 
equal to 0°30, 0°85 and 0°25. Calculate the approximate value of f(z) 
at the mode. [ I. 0. W. A. Dec. '78] ( Ans. 0°36 ) 


80. Given (45)=0°7071, f(50)=0°7660, _/(55)=0'8192, and 
(60) = 08698, find (59) correct to 4 gee of decimal. 
[I. ©. W. A. June 79] ( Ans. 0°8688 ) 


81. Find with the help of the symbolic operator E the value of 
* log10666 from the following table : 
log1 0654 =2'8156, log1658=2'8182, 1og,.662 = 28209. 
[1 CG. W. A. June 79] ( Ans. 2°8287 ) 
32. The values of a function f(z) are given below for some 
specified values of w : 
@: 3 4 5 9 
Ka): 6 5 ae 30 
Using an appropriate interpolation formula, find the value of (7). 
[C. U. B. Com. (Hons.) 1980] ( Ans. —10) 


38. Find the missing term in the following table : 
@: 0 1 2 3 4 
y: 1 3 9 * 81 
[I. 0. W. A. June 1980] ( Ans. 31) 
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FREQUENCY DISTRIBUTION 


Observation, Frequency 


Suppose the weekly wages (in Rs.) of 60 workers, in a certain 
factory, are collected by an investigator either from the office-records 
or by personal interviews. : 


The collected data are as follows (in Rs.) : 
SE aac OU AU Ay 
45, 20, 50, 10, 25, 98, 17, 28, 45, 30, 9, 18, 387, 45, 82, 


45, 41, 35, 87, 45, 36, 32, 40, 87, 40, 45, 82, 45, 17, 17, 
9, 18, 28, 35, 32, 47, 90, 25, 28, 26, 25, 17, 19, 80, 86, 


32, 26, 21, 26, 20, 30, 10, 10, 40, 20, 28, 50, 50, 30, 40. 
a at aT Pita ee Sen EM 


Here the variable observed is the ‘weekly wages’ and the data 
obtained are the observations or observed values. 


The raw data recorded, appear in a complex and arbitrary 
manner. One cannot fully grasp the true significance of the figures, at 
® first sight. So some modifications are necessary. Therefore, the 


data should be arranged in a definite order, either ascending or 
descending. 


Here the above data are arranged in ascending order which are 
a8 follows— 


9, 9, 10, 10, 10, 17, 17, 17, 17, 18, 18, 19, 20, 20, 20, 
20, 25, 25, 25, 26, 26, 96, 28, 28, 28, 28, 98, 30, 80, 30, 
30, 82, 32, 32, 32, 389, 35, 35, 35, 35, 87, 37, 37, 40, 40, 
40, 40, 41, 41, 45, 45, 45, 45, 45, 45, 45, 47, 50, 50, 50. 
I eR emma en em EGE 


The 60 observations are not all different, some of them are 
repeated. The distinct observations are known as the values of the 
variable. 
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The above arrangement can also be represented in the form of a 
table as shown below— 


TABLE I 


Weekly wages No. of Weekly wages No. of 
(Rs.) 


workers (Rs.) workers 

9 2 30 4 
10 3 32 5 
17 4 85 4 
18 2 37 3 
19 1 40 4 
20 4 41 2 
25 3 45 7 
26 3 47 1 
28 5 50 3 


A characteristic which can be expressed numerically is called a 
variate or variable. The number of times each variate occurs is 
known as its frequency. A frequency table is a chart consisting of 
the variates with their respective frequencies. A classification showing 
different values of a variate and the corresponding frequency is known 
as frequency distributions. 

In the above table, weekly wages are the variates and the number 
of workers getting the same wage is the frequency. Here 9 occurs 
2 times, 10 occurs 3 times, etc. Frequencies of the values (of variate) 


BLO ie , ete, are respectively 2, 3, ...-.- , ote. 


Frequency Distribution, Simple and Grouped. 
There are two types of frequency distribution : 
(1) Simple frequency distribution ; 
(2) Grouped frequency distribution. 


(1) Simple frequency distribution. 
This shows the values of the variate individually. Jor example, 
Table I (shown above). 


(2) Grouped frequency distribution. 
The above arrangement of data (shown in Table I) is suitable for 
a small number of figures. Now suppose there are a huge number of 


100 BUSINESS STATISTICS 


figures, Say, 1,000, then the above method of arrangement will not be 
helpful to the statistician for application of any kind of mathematical 
principles. 

Tn such cases, the values of the variate may be shown in groups 


or intervals giving rise to a Brouped frequency distribution,-as shown 
below— 


TaBLE II: Grouped Frequency Distribution 


Variate 
(weekly wages) Frequency 
(in Bs.) (no. of workers) 
from 1 to 10 5 
» ll» 20 11 
» 21 » 30 15 
» 31> 40 16 
» 41.» 50 13 
Total 60. 


ee 


Few Terms (Associated with grouped Frequency distribution) : 
(a) Class-interval. 
(0) Class-frequency, total frequency. 
(c) Olass-limits (upper and lower). 
(a) Class-boundaries (upper and lower). 
(e) Mid-value of class-interval. 
(f) Width of class-interval, 
(9) Frequency density, 
(h) Percentage frequency. 


(a) Class-interval. 


) weneeey 


he first class-interval 
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Tf one end of a class-interval is not given, then it is known as an, 
open-end class. There may be two open-end classes. For example, less 
than 15, 15—20, 20—25, above 25. The class-interyal having zero 
frequency is known as empty class. 


(0) Class-frequency, Total frequency. 


The number of observations (frequency) in a particular class- 
interval is known as class-frequency.. In Table II, for the class-interval 
1—10, class-frequency is 5; for the class-interval 11—20, class- 
frequency is 11 and so on. The sum of all class-frequencies is called, 
the total frequency. In the Table, it is 60. Total frequency is the 
total number of observations, 


(c) Class-limits. 


The two ends of a class-interyal are called class-limits, Of a 
particular class-interval, the smaller and greater numbers are known 
as lower and wpper class-limits respectively. In Table II, for the first 
class, class-limits are 1 and 10, whereas lower class-limit is 1 and 
upper class-limit is 10. For the next class, lower and upper class-limits 
are 11 and 20 respectively. 


(d) Class-boundaries. 


The class-boundaries may be calculated from the class-limits by 
the following rule : 


lower class-boundary = lower class-limit —4d, 
upper class-boundary = upper class-limit +4d, 


where d=common-difference between upper class of any class- 
interval with the lower class of the next class-interval. 


In Table II, d=1, for the first class-interval, 
lower class-boundary =1-—4x1=1-—‘5=0'5, 
upper class-boundary =10 +4 x 1=10+'5=10'5. 
Again, for the next class-interval, lower class-boundary =10'5_ 
and upper class-boundary = 20°5 and so on. 


Example. 


Find the class-boundaries of (i) 20-249, 25-29°9, ...... 
(ii) 20-25, 25-30, ...... 4 


(i) d="1, class-boundaries are, 19'95— 24°95, 24°95 — 29°96, ...... 
(ii) d@=0, class-boundaries are, 20-25, 25— 80, ..--248 
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(ec) Mid-vatue. 

The value exactly at the middle of a class-interval is known as 
its mid-value. It is calculated by adding the two class-limits divided 
by 2 (or, adding the two class-boundaries divided by 2). 

In Table II, for the first class-interval, mid-value= 1310 = 5'5, 


for the second class-interval, mid-value -1420 =15'5 and s0 on. 


(f), Width. 


The width (or size) of a class-interval is the difference between 
the class- boundaries (not class-limits) 


width = upper class-boundary — lower class-boundary. 


In Table IT, for the first class, width =10'5—‘6= 10, 
for the second class, width = 20°5 —10°6 = 10, andsoon, © 


(9). Frequency density. | 


It is the ratio of the class-frequency to the width of that class- ] 
interval, i.e., { 


Frequency density = Sia ene F 


In Table II, for the first class, frequency density = 3 ="5, 


for the second class, frequency density = ik =1'1 and so on. 


| 
| 
| 
i 


(h) Percentage frequency. 
It is the ratio of class-frequency to total frequency expressed as 
percentage, 7.¢., 
= class-frequency 
Percentage frequency tohabixeytianey x 100 
(=Relative frequency x 100). | 


In Table II, for the class-frequency 5, % frequency “4h x 100 


=8'33 
wena ia Ye 4 frequency = 55 *100=18'88 and so on. 


; Now all these terms are illustrated i the followi i 
reference to Table II. ar a 


FREQUENCY DISTRIBUTION 


108 


ed 
———— 


00.00T = = = = abs = = 09 T810L 
19.16 §.T or 9.9F 9.09 9.09 0g WF eI 0s—Th 
19.9% 9.1 oT 9.98 9.0% 9.08 oF 18 9T Ov—TS 
00.9% 9.1 or 9.96 9.08 2.0 08 Té QT 0&—T6 
88.81 LI oT g.9T 2.06 9.0T 0G It It 0@—TT 
€6.8 9.0 OL Gg 9.0T 9.0 or T g 10) Go | 
(ot) (6) (8) (2) (9) (9) (¥) (8) (3) (1) 
Aouenbery Ajisuep ssejo jo enjsa aeddn. TO MOT zeddn azemo, Aouenbesy [Atequr 
esequeo1eg Aouonberg IPI “PI $92./DPUN0G-88D]) $70UL1)-88D]0) “88819 “88810 


eS — —_——000©O0 OH OOwwL 


(II e198 : 98 ) 
“210 ‘YIDLM ‘onfDa-pLeT ‘soLuopunog-ssvjp) ‘sprur-ss01Q fo uoLMAasny)T 


Il 214viL 


104 BUSINESS STATISTIOS 


Discrete and Continuous Series, 


In Table II, the first group ends at Rs. 10, while the second 
group starts at Rs. 11, and there is a gap of Re. 1, between these 
groups. Similar gap lies between any other two groups. So thero isa 
discontinuity of the series, usually known as discrete series, By slight 
alteration of the groups, a discrete series may be converted into a con- 
tinuous series. Instead of writing the first group as Re, 1 to Rs. 10, 
write as Re. 1 and less than Rs. 11, similarly for the second group, 
Rs. 11 and less than Rs, 21, etc. 


Hence Table IT, can be written as a Continuous Series as follows : 


TABLE IV: Continuous Series ( Data : Table IT ) 
Frequency distribution of weekly wages obtained by 60 workers 


Weekly wages (Rs.) 
(variate) 


1 and less than 11 


a EiPeciay osueauoee ty 
BBs ices tet OF 
31 . 41 
Bet ae OS BT 
Total 
_ OO 
Guialbge Discrete and continuous serios are also discussed in the chapter of 


Compilation of Frequency Distribution Table from Raw Data. 
Instead of going through so many stages, we can directly get the 

Table IT, from the raw data collected, with the help of Tally Marks. 
In counting values, a vertical line (/) is used for one, four 


vertical lines crossed by a diagonal line (IKI) are used for every five 
countings belonging to the Same group, as shown below : 
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TABLE V: Frequency distribution of weekly wages 


Wiel POT6S 5 sty snarke \aFaeguaney 


(Bs.) 

1=10 py 5 
11—20 YW Tw il 
21-30 /V/ JA PW 15 
31—40 DW JX [W116 
41—50 AW PX II 13 
Total — 60 


Construction of Frequency Distribution Table. 


The process of constructing a frequency distribution table from 
raw data are as follows— 

1. The smallest and largest figure are to be located first and 
then find the range (i.¢., the difference between the largest and smallest 
figures). 

2. Distribute the range in a suitable number of class-intervals. 
The number of these class-intervals should preferably be between 6 
and 15, depending on the number of observations. However, there is 
no rigidity about it. The class-intervals may be more than 15, 
depending upon the total number of observations. 

The class-limits should be chosen in such a way, that most of the 
observations lie within the class-limits. Same width of the class- 
intervals are preferred. Of course, there may be unequal widths in 
Some cases, 

8. The number of observations (i.c., frequency) lying in each 
class-interval is determined by Tally Marks. 

4, The construction of a table will be complete, by placing class- 
intervals in the first column, telly-markings in the second column, and 
corresponding class-frequencies in the third column. 


Note. Choice of number of class-intervals, The number of class-intervals should 
neither be too large (as frequency distribution will be very large) nor too small (as 
essential characteristics of the distribution will not be revealed), As @ working rule, 
the number lies between 6 and 15. For lesser number of-observations, Sturge’s formula | 
may also be used, n=1+3'3 log (N), where n=the number of classes, N=total 
frequency. To ensure continuity and to get correct class-interval, we should adopt 
exclusive method of classifications, 
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Choice of Class-limits. 


The lower limit of the first class-interval should either 0 or 5 or 
multiple of 5. For example, if lowest data is 23, then for a width of 
class-interval 5, the first class should be 20—25 (instead of 19—24), 
The approximate width of the classes may be obtained by dividing the 
range of the data by the number of classes. Further the class-limits 
should be so chosen that the observations occurring most frequently lie 
within the class-limits preferably near the mid-value of the class-limits. 


Example. 


The following is an array of 65 marks obtained by students in 
certain examination : 


26, 45, 27, 50, 45, 32, 36, 41, 31, 41, 48, 27, 46, 
47, 31, 84, 42, 45, 31, 98, 97, 49, 48, 47, 32, 33, 
35, 37, 47, 28, 46, 26, 46, 31, 35, 33, 42, 31, 41, 
45, 42, 44, 41, 36, 37, 39, 51, 54, 53, 38, 55, 39, 
52, 38, 54, 86, 37, 38, 656, 59, 61, 65, 64, 72, 64. 


Draw up a frequency distribution table classified on the basis of 
marks with class-intervals of 5, 


TABLE VI: Tally Sheet 


Class-inter vals 


of marks Tally marks Frequency 
B29 py y 7 
30-84 py my = S10 
8589 TH I) 18 
40—44 TA 8 
45-49 pw mM AB 
55-59 yy 3 
60-64 yy 3 
65—69 / 1 
70—74 J 1 
Total — 65 
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Now the required frequency distribution is shown below : 
TABLE VII : Frequency distribution of marks obiained by 65 students 


Marks 


Frequency 


25—29 
30—34 
35—39 
40—44 
45—49 
50—54 
55—59 
60—64 
65—69 


Total 


70—T74 


Cumulative Frequency Distribution. 


It is a form of frequency distribution in which each frequency 
beginning with the second from the top is added with the total of 
the previous ones, the class-intervals being adjusted accordingly. 


Example. 


TABLE VIII: Cumulative frequency distribution showing the marks. 
(Data : reference Table VII) 


Frequency 


Cumulative 
Frequency 
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The cumulative frequency up to 34 (up to the upper class-limit 
of the second class-interval) is obtained by adding the frequency of 
the second class with that of the previous class and soon. This kind 
of cumulative frequency is known as ‘less than type’ cumulative 
frequency when addition is done from top. Conversely if the addition 
is done from below, then it will be ‘greater than type’ cumulative 
frequency. 

It may be noted that for less than cumulative frequency 
corresponding to the highest class-boundary and greater than cumula- 
tive frequency corresponding to the lowest class-boundary must 
be equal to the total frequency. Also the sum of these two 
types of cumulative frequencies at any Stage of variate is the total 
frequency. 


Uses of Cumulative Frequency. 


Tt is used (a) to find the number of observations less than or 
greater than any given variate ; (b) to find the number of observation 
lying between any particular class-interval; (c) to find median, 
quartiles, deciles and percentiles Sraphically (will be discussed in the 
chapter of Average), 


Example. 


From the following table find (a) the less than and (4) greater 
than cumulative frequencies, (c) cumulative frequency distribution, 
(@) cumulative Percentage distribution. 


Wages (Rs,) 11—30 21—80 | 31—40 | 4150 | 5160] 61—70| Total 


Frequency 5 Te 12 50 


The class-boundaries of the class-intervals are respectively 10'5— 
20°5, 20°5—380'5, ...... , ete. (ref. Table III). The boundary points are 
10°5, 20°5, 30°, ......, etc, There is no frequency below 10'5, so its 
cumulative frequency is 0; the frequency below 20°5 is 5, the fre- 
quency below 30°5 is 19 (=5+7), the frequency below 40°5 is 
24 (=12+12) and so on. This is less than cumulative frequency. 
For greater than type, we are to start adding from the end. Now 
corresponding to the class-boundaries 70°5, 60'5, 50°5, 40°5, +++, ete. the 
Bs jn connie frequencies are 0, 3,11 (=3+8), 96(=11 © 

) aver OG, 


‘ Cumulative frequency distribution consists of the variates (class- 
boundary points only) with the corresponding less than cumulative 
frequencies. 
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TABLE IX: Cumulative Frequencies 


Cumulative frequencies 
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bis ies ica (less than) | (greater than) 
10° 0 50 
20°5 5 45 
30°5 12 38 
40°5 24 26 
50°5 39 11 
60°5 47 3 
705 50 0 


LS 


Cumulative frequency and Cumulative percentage distributions 
——— Oe Eee ee 


Cumulative frequency 
Wages (Rs.)! (less than) 
105 0 
20°5 5 
305 12 
40°5 24 
50°5 39 
60°5 47 
10° 50 
Example. 


Cumulative percentage 
(less than) 


) 


0 
10 
24 


Present the following data of the percentage marks of 60 students. 
in the form of a Frequency Table with 10 classes of equal width, one 
class being 40—49. 


41 17 83 63 54 92 60 =—«58 70 ~=«(06 
67 «82 33 44 57 49 34 73 «54 63 
3652 82 75 60 8309 19a 30 
42 93 43 80 03 32 5767 24 = «64 
63 11 35. 82 10 23 00 41 60. | 82 
72. 53 92 88 62. 55. 60 38 40° BF 


a ce a en eS 


[0.,A.1966 } ce 
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Here minimom marks is 00, maximum mark is 93, and width of 
class-interval is 10 (from given the class 40—49). 


Frequency distribution of marks obtained by 60 students 


Marks Frequency 
0009 hinted. 4 ie 
10—19 3 
20—29 3 
80—39 10 
40—49 7 
50—59 9 
60—69 ll 
_1—79 : 5 
80—89 

90—99 

Total 60 


——_—— 


Example. 


Age at death of 50 persons of a town are given below— 
Re es Sa SU OE 2 
36 48 50 45 49 31 50 48 43 42 
37 32 40 89 41 “47 45 39 43 47 
88" 39787 dQ 8a Bg 5681 54 886 
51 46° 4055) 58. 81 4g? Bg 324 
53 86 «6460 «659 04's 36 38 §60 


_ (a) Arrange the data in a frequency distribution in 10 class- 
intervals ; and 


(0) Obtain the percentage frequency in each class. 
[ B. Com. 0. U. 1972] 


classes should be $= 2'9 or 8. Taking $ as width of the classes, Fre- 
quency Distribution Table is drawn. 
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Frequency Distribution Table and Percentage frequency 


Example. 


If the class mid-points in a frequency distribution of weights 
of students are 128, 137, 146, 155, 164, 173 and 189 pounds, find 
(a) the class-interval size, (b) the class-limits. [C. A. May 1964] 


The difference between any two consecutive terms of the class 
mid-points is 9. Taking ‘class-boundaries, the class-limits of the first 
class being 123°5 (=128—4'5) and 139'5 (=128+4'5). In the same 
way, the other class-limits will be 132°5—141'5, 141°5 -150'5, 150°5 
—159°5, 159°5 — 168'5, 168°5 -177°5, 177°5 -186'5. 


EXERCISE 5 


1. Discuss the various steps in the preparation of frequency 
distribution from raw data, [O. U. M. Com. 1969 ] 


2. Discuss the problems in the construction of a frequency 
distribution from raw date, with particular reference to the choice of 
number of classes and class-limits. [I.0. W. A. Jan. 1972 ] 


3. What do you mean by a cumulative frequency distribution’? 
Point out its special advantages and uses. —_[ I. 0. W. A. Jan. 1971] 
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4. Explain the terms: class-interval, class-limits, class mid- 
point and class-frequency. [ G..A. Nov. 1964 ] 
5. Taking the class-limits 5-9,10-14, 15-19, ---, etc., construct 
a frequency distribution of the following sets of observations :— 
7 27 10 19 39-84 24 24 41 20 
23 44 47 36 53 20 16 45 23 22 
10 13 31 1l 30 21 31 22 28 17 
27 32 42 20 15 34 1 29 44 21 
59 36 22 18 97 23 a1 25 17 28 
34 23 48 32 49 29 2 52 43 40 
33 37 40 22 14 38 28 23 25 
a7 16 29 20 17 23 19 23 45 35 
22 33 15 AB ie 25 38 24 22 13 27 
12 24 19 9 12 24 380 35 387 22 


(Ans. 2, 8, 12, 30, 14, 10, 9, 7,5, 2,1) [B.A. (Hons.) 0.0. 1964] 
6. From the following observation, prepare a Frequency Distri- 
bution Table in ascending order starting with 5-10 (exclusive method) 
marks in English : 
12 3640 3088 20 19 10 10 16 
19 27 15 26 20 19 7 85 33 21 


26 37 5-20 11 17 37 80 20 5 


(Ans. frequencies 3, 4, 6,5, 4,8, 4). [B.Com., Bangalore, 1968 } 
7. You are given below the wages paid to some workers in a 
small factory. Form a frequency distribution with class-interval 


10 paise. 

(Wages in Rs.) { 
Sg SS geet ated abate ce ee es 
110 113 144 «+4144 %4197 #+4117 «+198 136 1°30 
127 (124 «-1°78_—sd1'51ss1'12ss1'42)Ss1'08—s-1'58 146 
140 121 162 1°31 15 1°38 1°04 1°48 1°20 
160 #170 4109 149 2186 1°95 1°50 182 1°42 
129 164 138 187 414i 177 «#115 1°57 1°07 
165 136 167 #141 #4155) 1:99 169 #167 «(1°34 
145 139 195 «6196 «61°75 —°57 153 «187169. 
119 152 156 1°32 1°81 1°40 147° «138. «1°62 
176-128 «192 «146 «6-146 =~ -1°35 116 46142 «1°78 
1°68. 147 137 1°35 1°47 143.166 1°56. 1°48 
ETA OG, sp AB ey FB 

(Ans. class-intervals : 1'01-1°10, 1°10 = 1°20, -.., ete, 


frequencies : 5,7, 10, 15, 18, 14, 9,5, 4,8)  [0.A, May 1967] 


| 
: 


| 
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8. Construct a frequency distribution showing the frequencies 
with which words of different number of letters occur in the extract 
reproduced below (omitting punctuation marks), treating as the variable 
the number of letters in each word : 

‘A candidate at the time of applying for registration as a student 
of the institute should be not less than eighteen years of age and have 
passed the Intermediate Examination of a University constituted by 
law in India or an examination recognized by the Central Government 
as equivalent thereto, or the National Diploma in Commerce Examina- 
tion or the diploma in Rural Services Examination conducted by the 
National Council of Rural Higher Education.’ 

{I. 0. W. A. July 1967 ] 


(Ans. Number of letters 
inaword 12 8 45 678 9 10 11 12 | 


frequency 831912 443 664 4 5 2 72 
9. The following table gives the scholastic aptitude scores of 


the 50 departmental students of a certain department in a certain 
university : 


845 530 556 354 590 
895 515 479 494 420 
563 444 629 440 485 
505 604 490 445 605 
402 406 730 506 516 
472 475 610 586 528 
691 520 465 468 545 
624 582 570 578 505 
523 575 420 605 527 
461 440 585 420 384 


(i) Construct a frequency distribution table with appropriate 
class-limits and class-boundaries (take the length of the class equal 
to 30 units). 


(ii) Draw histogram to represent the above frequency distri- 
bution. [ I. C. W. A. Dec. 1978 ] 
(Ans. (i) class-limits : 345-374, 375 — 404, ... 
class-boundaries ; 344°5 —374'5, 374°5 — 404'5, ... 
frequency: 2, 3, 4, 5, 8, 8, 3,6, 7,2,0,1,1 ~ 
(ii) using the class-boundaries, histogram is to be drawn.) 
Bus. Stat.—8 


) 
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10. The weights in Kilogram of 50 persons are given below : 


Toren Gane wmer a. Ob 66 Te ea Es 46 
Bohne Seam BomtnOOG NAT, 95D 674, 187), 44. 84 
64 74 48 iB CT:Cé‘~iSY:(C(‘<éS‘CAKZ]C“‘ésON”~SCO#O‘2L 58 
4252 62 72 48 #68 TL 64 «+58 ~~ 67 
4B oo 6b 75. 486) 89 OF 17 64 78 


Arrange the above data in a frequency distribution with class- 
interval of 5 kg. Construct the frequency polygon ona graph paper 
with above data. [ 6.U. B. Com. (Pass) 1980 ] 


11. Marks obtained by 50 students in a History paper of full 
marks 100 are as follows : 


eee 9b a5 apt) sg) Mag egmatetgg! Ells. 4g 
PAE A 548 2a AS AB BGS dB 48. AT 
3G 60.) Si 4788 6b B89. «18 
BOG 900 47 40s bl 8B Aa 85 52 SL 
SETG . G7ON 850 GiieTou O40 BARE 89° <e3e0. 46 


Arrange the data in a Frequency Distribution Table in class-inter- 
vals of length 5 units, Draw a histogram to present the above data. 


[L.0.W.A. June 1980 ] 


12. The marks scored by 50 students in Geography are as 
follows : 


a a a 


80 45 48 55 39 25 31 12 18 a1 


Bh b9s ible V3 43, 4d 0 8819 a6 


4) 35 87 41 46. 33 51 387 58 48 
17 19 23 26 29 38 57 36 35 44 
43 27 31 43 22 31 47 34 18 15 


Prepare a Frequency Table with 5 class-intervals each of width 
10 marks and hence draw the ogive of both types. 


| 
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Introduction. 


We have seen in the earlier chapter how statistical data are 
condensed to a large extent by tabulation. For practical purpose, 
tabulation is not enough, especially when we are to compare two or 
more series of data. In order to make them comparable, it is essential 
to reduce the figures into one figure. For example, it is required to 
compare the daily wages obtained by 100 workers belonging to a 
factory 4 with the daily wages of 100 workers of factory B. It would 
be impossible to arrive at any conclusion, if these two series are 
directly compared. Now, if each of these series is represented by one 
figure, comparison would be extremely easy affair. 


Satistical data when represented by graphs or diagrams appeal, 

more to the eye than to the mind. Unless we are able to describe the 

. main theme of a series or what it tends to suggest, we cannot deal with 

the series adequately. There is a necessity for some single measure- 

ment which may give the summary description of the characteristics of 
a largo group of variables. 

In most frequency distributions, we find that the tabulated values 
show small frequencies as the class-limits, while at the middle part 
frequency is highest. This indicates that near the central part of the 
distribution, most of the items of the series cluster. Such figures are 
known 28 Measures of Central Tendency or Averages. Average represents 
a whole series, as such its value lies between the minimum an 
maximum values and generally it is located in the centre of the 
distribution. 


: The object of an-average is to represent a number of yariates in & 
simple and concise manner. So it is a representative figure of the entire 
data. Secondly, it is a basis of comparison with other groups. — 


Types of Average. : 


Broadly speaking, there are three types of Measures of Central 
Tendency (or Avarage) : 5 


1, MEAN 2. MEDIAN 8. MODE 


116 BUSINESS STATISTIOS 


Mean, again, is divided into three types : 


1. Arithmetic Mean (A.M.) 
2. Geometric Mean (G.M.) b 
3. Harmonic Mean (H.M.) a 


Unless specially mentioned, the term mean generally refers to 
the arithmetic mean. It has the maximum application amongst these 
three types. Median and Mode are known as Average of Position, 


while Mean is known as Mathematical Average. 


Use of Sigma (3) Notation. 
n a 
The sum #1 +@at+a@+---+2n is often denoted by 2 a or Sa. 


Similarly, the sum piti+parg FPsGg te + Dna may be 
denoted by 2 Di @; or Spx 
The quotient 7322+ Pata +--+ Pn is denoted by =2. 
Pitpate +n =p 
Arithmetic Mean (4.™.). 
Definition: The Arithemetic Mean of the values of a variate. 


®1,@a, @,'"*, tn 18 the sum of the values divided by their number, 
Now if X denotes the A.M. of the quantities, then 


Katt eat tet +a _ Da 
n n 


This mean (A.M.) is known as Simple Arithmetic Mean. 
Example. 

To find the A. M. of the numbers 2,5, 9, 11, 13. 

Here n (the total number of items) =5, 


Now, A.M. aT Steet 18 10 


Weighted Arithmetic Mean (Weighted Mean). 


Definition: It the n values of » variate @1, 22, @ 

5 ss) On STONE 

taken f;, Jar Fs; ++) Fu times, respectively (j.c., if, fe achat Bere are 
the respective frequencies of ©1,@g,@s,... , gn) then j 


i X) aftr tobe they to + fan, Bye 
eles Mee eee af 
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Example. 


To find average income from the following table : 


Daily income (Rs.) 


No. of workers 


_ Variate (x) Frequency (f) fo 
(income in Rs.) | (no. of workers) 

(1) (2) (3) =(1) x (2) 
& ectt aug renee te pe 

5 1 5 

9 4 36 

Ly hod 22 

13 uk 13 

Total 10 80 


Weighted Mean (or average income) = = = 2 =Rs. 8'00 


Note. Here, ©,=2, ©,=5, w,=9, ,=11, 7,18 and 
fi=2 fom, fy=4, fp=2, fom. 
This method of computing A. M. is known as Direct Method. 
Important Property of Arithmetic Mean. 
(1) The algebraic sum of the deviations of the values from their 
arithmetic mean is zero. 
Proof. (a) (Simple) Arithmetic Mean : 


The differences #; —X, #2—-X, %»—X,..., an— X (irrespective of 
sign) are called the deviations of a1, a, %g,..., nm respectively from 
the mean X. Now, 


z (=X) =(@1 = X) + (a -X) + (y—X) + +++ (te X) 
= (01 +24 +++ +an)—-(X+X+-~ m times) 


= Sa-nX =nX —nX (X= z/n) 
=0 
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(0) Weighted: Arithmetic Mean: [0C. U. B. Com. (Hons.) 1980 i 


3, Nler~D files -3)+ fala +--+ falta —D 
= fits + fata t+ + frtn —(fX+foX +++ + fnX) 
= Sfe-(fatfat- +fn)X : 

=ZfX—ZfX=0 (as X= Bfe/d/) 


- Example. 
A. M. of 2, 5, 9, 11, 18 is 8. 


H 
Now, the deviations are : (2-8), (5-8), (9-8), (11-8), (18-8), 
ie, —6, —8, 1, 3, 5 whose sum is -6—3+1+8+5=-—9+9=0, 


Example. 

Weighted A. M. of 9, 5, 9, 11, 18 having frequencies 2, 9, 3, 2,1 
is 7'6 (by the process shown before). i 

Deviations are : (2-7°6), (5—7'6), (9- 7'6), (11—7'6), (13— 7°6), 
te, —6°6, —2°6, 1°4, 3°4, 5'4, | 
Again 2(—5°6) + 2(-9°6) + 3(1°4) + 2(3°4) + 1(5°4) | 
= -112+5'24+49468+5'4 =-164+16'4=0. | 
| 
(2) Prove that for @ given set of observations the sum of the | 
squares of deviations is the minimum, when deviations are taken from 
the arithmetic mean. (I. 0. W. A. Jan. 1971] 
a ~ A=(¢-X)+(X-A), where x; are n observations, 

X is actual mean, 

A is any arbitrary constant. | 
Now 3(e;- A)=3 (%~X)+5(K- A) 
And > (=A) =3 (a~X)* +3 (X~A)?+95 (@;-X) (K-A) 
=3 (ti —X)* +n(X- A)? +X - A) (a; ~X) ] 
si ne as (X- A) is constant. | 
=2 (@:—X)* +m (X-A)? [as ¥(e,-X)=0 | 
by prop. (1) J. 

Now 3(2;-A)*? > (a -X)* a8 n(X—- A)* is always positive. 

* The sign of equality holds when and only when X=A, 
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Example. 


Sipe sao lane 

ERED 4 

6 0 0 

8 2 4 
10 Pinal tikes 

Total: —, (40 
Some dsdssh mols Mo 

AM. (qa 2 FAA ETB FO A, 


Now if the deviations are taken from any other value then the 
sum of the squares of deviations would be greater than 40. 


(3). If Xs, Xa be the means of two groups having observations Ni, 
Na respectively, then the mean (X) of the composite group N(=Ni+ Ne) 
is given by the relation NX=NiX1+NiXs. 
This may be generalised for any number of growps 
NX =NiX1+NeXat+NsX5t+°". 
Note. This property has been discussed with illustrations under the heading 
of Mean Composite Group later on. 


Short-cut Method of Determining Mean (Method of Assumed Means). 
(1) Simple A.M. : 
Tt d; (i=1,2,...,.) be the deviations of the n observations 
24, @a, +, tm from any arbitrary value A (as near as possible to the 
true mean) 
then djg=a,-A or m=At+d 
Now =teteato tee (Ads) (A + dg)t wat da 
n 
_mA+(datdet+--+dn)_nA+3d_, 3d 
n n n 


Alternatively : 
w=(c—-A)+A=d+A 
or Ze=2(d+A)=Zd+TA=Td+naA 
Ze _2d nA 
ae pm bh 
n n n 
zd 


n 


or 


or =-X=A+ 
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Example. To find AM. of 2, 5, 9, 11, 13 by short-cut method. 


Let A (assumed mean) =9 
AM.-A+24 
n 
eat ra 
9+ 3 9-1=8 
Note. If, it is taken A=11 or 5, we would 


get the same result, i.¢., if the value of Origin (A) is 
changed, A.M. remaing same, 


(2) Weighted Mean. 


We have, Ra Aoi the tet wt fate when 3f=N- 


wf (A+ di) +fo(A+do) 4+ f, (A +dn) 
N 


(AA tfoA tot faA)+(fids t+fodg +++ +fndn) 
N 
aAlA that ++ fa) + fd 
N 


aAN+3/d_ 4, 3fd_, , syd 
a AD At 


Example. 
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Let A (=assumed mean) =9 


=A+ 2429 4(=14) 9 yaar, 
Now A.M.=A+ Sf 9+ 0 9-14=T76. 
Note. If, itis taken A=11 or 5, we would get the same result. So if the 


value of origin (A) is changed, mean is unchanged. 


Step Deviation Method. 


In this method, the only additional point is that we take a 
common factor (usually the width of class-interval in case of grouped 
data of equal width or L. CO. M. of deviations taken) and multiply the 
result by the same common factor. This is for simplifying the 
calculation. 


The formula stands : 


A.M. (X)=A+ 3fd xt, When A=assumed mean, f= frequency, 


sf 
atrA 


(ities 
d 7 = G’ t= common factor. 


Note. This formula can be proved by putting d=d’ xi in the above formula. 


Alternatively": 


Let w= mea, where A and d are constants, i.¢., variates 2; have been changed 


to new variates w;. 


Now a=A+du; 
or fia;=f(A+du;), multiplying both sides by f; 


or Zf~a,=Zf;(A+du,), taking aggregate (2) to both sides 
= E(frd + dfs) = BfrA+ DAfu, 
=AZf,+dzfu;, as A and d are constants 
=AN+d23 fu; 


or, Pasa 2hee, dividing by =f (=N) 
or, X=Adu, 
Example. 
To find A.M. from the following table : 
@: 10 20 30 40 50 60 


ames (Aid 6/7 ldo 7 Bee 
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Calculation of A.M. : 


(3) 


- 380 


— 20 


Let A (assumed mean) = 40, 


In the above Table two common factors (i.¢., two scales) 10 and 6 — 
have been taken to scale down the data and shown separately in the 
Table. 


For the scale i=10 


did’ 
=f * 


Again for the other scale i=5 


aM.= A+ 28 x §=40+'519) x 10240-4386, 


UMP TCE Mp pain Geatsy a mae 
AM.mA+ “Sox im40+°25"' x 5=40-4=36, 


We get the same result for different scales, 


Calculation of Mean from Grouped Data (Continuous Series). 


In case of grouped data, for computing Arithmetic Mean, the. 
only additional work is to find the mid-valnes of each class-interval. 
This is done by taking the Arithmetic Mean of class-limits (or class- 
boundaries), The other steps in the calculations remain the same. 
The idea will be cléar from the example given below. The calculation — 
may be done by applying any one of the following methods .: 
(i) Direct Method, (ii) Short-cut Method, (iii) Step Deviation Method 
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Example. 
Calculate the mean weight from the following table : 


eee 


Weight (Ibs.) | 95—105 | 105—115| 115—125 | 125—185/ Total 


No, of students 20 26 38 fa 26 100 


No. of 
Wi. (tb) nig is Mid-value d=a-A 


(x) 


95—105| 20 92+ 105 = 


105—115 26 
115—125 38 
125—135 16 


Total 


Let A=120 


Ayre = 500 90-5 = 
A. MinA+ 95 = 1204559 7120-5= 115 Ie, 


Use of Step Deviation. 


Example: Calculate the mean from the following table : 


Monthly wages (Rs.) 
of domestic servants O—10 10—20 20—80 30—40 40—50 


No, of servants A 4 10 92. 30 


50—60 60—70 70—80 80—90 
35 10 7 1 


(.O.A. Noy. 1962 ] 
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a 
Frequency 
(f) 


Class-interval 


Mid-value d d'=d+10 fa’ 
wages (Rs.) 


(x) =c-A 


Let A=55 
Bc AE ee es 
A.M. At Sy i 55+ 720 *10=55-8'17 
=Rs. 46°83. 
Example. 


The following are the monthly salaries in rupees of 20 employees 
of a firm :— 


145 161 65 71 182 118 4142 6 85 95 


The firm gives bonuses of Rs, 10, 15, 20, 25 and 30 for individuals 
in the respective salary groups: exceeding Rs, 60 but not exceeding 
Bs. 80 ; exceeding Rs, 80 but not exceeding Rs. 100 and so on up to 
exceeding Rs. 140 but not exceeding Rs, 160. Find the average bonus 
paid per employee, [C.A. Noy. 1964] 


| 
; 
we find the number ' 
ioned as follows : | 


From the monthly salaries of the employees, 
of employees lying between the salary groups ment 
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Bonus (Bs.)| No. of employees 


Calculation of average bonus 


Sal No. of 

ar (Bs) | smployees (x) (f) Se 
60— 80 4 10 4 40 
80—100 4 15 “4 60 

100—120 5 20 5 100 

120—140 4 25 4 100 

140—160 3 30 3 90 
Total 20 Total 20 390 


=fe _ 390 


A.M. = 23 =3p, = Rs. 19°50. 


zf 20 


Calculation of A.M. from Unequal Width of Classes. 


The calculation is similar to that of the equal width of classes. 
The idea will be clear from the following example. 


Example. 


The Table given below shows the number of persons with different 
incomes in U. 8. A. during the year 1929 : 


Income in thousands of dollars 


Ot 
8 
Bere 
S=rsp 
5— 10 
10— 25 
25— 50 
50— 100 
100—1000 


No. of persons in lakhs 


13 
90 
81 
117 


—Calculate the average income per head. 
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Income ('000) No. of persons Mid-calus 


in lakhs 
dollars () 
oO 1 13 
I— 2 90 
2— 3 81 
8— 6 117 
5— 10 66 
10— 25 Q7 
25— 50 6 
. 50— 100 2 
100—1000 


Total 
Let A=75 


A pee Ue ieee ODA eee en 
AM.-A+2=T5+ "7a, =75+'66=8'06 dollars. 


Calculation of A.M. in case of Open-end Classes. 


Open-end classes are those in which the lower class-limit of 
the first class and the upper class-limit of the last class are not known, 
Assumption for finding such class-limits depends upon the classes 


following the first class up to the proceeding of the last class. For 
‘ example : 


Marks No. of students 
below 5 10 : 
5—10 7 
10—15 5 
15—20 9. 
above 20 é 9 


In the example, the width of classes is uniform, so the appro- 
priate assumption would be lower limit of the first class is 0 ‘and 
the upper limit of the last class is 25. Thus the first class would 
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be O—5 and the last class 20—25. Now mean is to be calculated 
ag usual process. 
Let us take another example : 


Marks No. of students 
below 5 10 
5—15 7 
15—30 5 
380—50 9 
above 50 9 


‘ pa REAR ALE LS Ee aD ae 

In the second class, width is 10, in the third class, width is 15 
and in the fourth class, it is 20, i.e., width is increasing by 5. So the 
appropriate assumption would be that the lower limit of first class is 0 
and the upper limit of the last class is 75. In other words, the first 
class would be O—5 and the last class 50—75. 

For class-intervals of varying width, no’ assumption is appro- 
priate for finding open-end class-limits. In such cases, calculation of 
median or mode is preferred than that of mean. 


Finding of Missing Frequency. : 

The ides of finding the missing frequency will be clear from the 
following example, 

The A.M. of the following frequency distribution is 1°46. 


No. of 
acofdentu tie 1 2 3 ‘a 5 | Total 
No. of days 
(reqrienen) alee fi fa 95 10 5 | 200 


Find the values of f; and fs. 


=f 
=f 


upeo a wBan fs 
146=2+ 


-0' _ —(82+F1) 
0°54 900 


AM.=A+ 


25 1 25 | or, 108=82+/1 
10 2 20 or, f1=76 
5 5 3 15 Now f2=200-(46+ 76+ 25 +1045) 


=38. : 
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Checking Accuracy of Computations. 
The formula of ‘Charlier’s Check’ is as follows : 
Z/(d'+1)=3fd' + 3f lor Efld+1)-Efd+ Ef] 
If this equation is not satisfied, it means that there is some 
mistake in calculation. ’ 
Example. 
Apply Charlier’s Check in the following Table to find the mean : 


—<—________. 
Marks 0—5 5—10 | 10—16 | 15—20 | 20—25 | Total 


No. of 
students 


No. of 
Marks | students | Mid. pt.| d=a-A| d'=d/5| fa! 
(f) | (a) 


c—5 | 10 25 | -10 -2 -20 


-10 

5—10 7 75 -65 sar 3 mia | 5 
10—15 5 0 0 0 
15—20 9 5 18 
20—25 10 27 


Total 40 i — cs 
Let A=12'5 
From, Sf(d'+1)=sfd'+ sf 
Left side = 40 ; Right side=0+40=40, 


Hence the caloulation is correct. 


Ui 
Now, A.M.=A+2L2 x §- 195+ 7x 5=19'5+0=19'5 marks. 


Mean of Composite Group. 


If Xi, Xs are the means of two groups having observations 
Ni, Ne respectively, then the mean (X) of the composite grou: 
N(=Ni +Naq) is given by the relation, 2 Ate 


NX=NiXi+NeXs. : 2 
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Note. For three groups of respective observations N,, Ni, N; and means 
Xi, Xa, Xs, we have NX =N,X,+NoX, +N, X,, 


Example. 


The mean annual salary paid to all employees*of a company 
was Rs, 5,000. The mean annual salaries paid to male and female 
employees were Rs. 5,200 and Rs. 4,200 respectively. Determine the 
percentage of males and females employed by the company. 

Here, X,=5,200, Ni=?, X2=4,200, X=5,000, N =NitNe. 

Now, (Ni +Nq) 5,000 =5,200N, +4,200N, 

or, 5,000N, —5,200N; =4,200N, -5,000Ne 
or, -—200Ni1=-800Ne 
or, Ni: Ne=800: 200=4:;:1 


4 ie as 
Percentage of male T4i* 100 5 100= 80, 


1 1 
and that of female = 77 1* 100=—— x pap en 


Example. 
Calculate the mean from the following frequency distribution : 


Sen fx 2H tna ie Soe awe it cae R22 lt 
ze: 2 8 6&6 6—8 9—11 12—14 15—20 “21—96 97—39 
fi 4 “Diane 4 a) 6 38 1 

=. Sse 


The whole distribution is divided into 3 parts. For each part 
mean is to be calculated. Lastly to find final mean by using composite 
group. 


Part I Part II 
x f te 
2 4 8 
8 1 8 
5 2 10 
Total 7 21 
ao Ngati cD 
A.M. sf 7 3 


Bus. Stat.—9 
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Part III 


205 oy 
AM. 10 20°5 


Now, arranging the data for 8 parts, we find 
Ni=7 Ne=10 Ne=10 N=7+10+10=97 


Xi=8 X,=10 X,=205 X=? 
We find, 27X =7* 3+10x10+10x 205 
=21 +100 +205 =396 


Se Jo eg 
Cieee-4 7 12°07 


Advantages and Disadvantages of Arithmetic Mean. 


Advantages : 
(i) It is easy to calculate and simple to understand. 


(ii) For counting mean, all the data are utilised. It can be 
determined even when only the number of items and their 
aggregate are known. 

(iii) It is capable of further mathematical treatment, 


(iv) It provides a good basis to compare two or more frequency 
distribution. 


(v) Mean does not necessitate the arrangement of data. 


Disadvantages : 


(i) It may give considerable weight to extreme items. Mean 
of 2, 6, 301 is 103 and none of the value is adequately re- 
presented by the mean 103. 

(ii) In some cases, arithmetic mean may give misleading 
impressions. For example, average number of patients 
admitted in a hospital is 10°7 per day. Here mean is a 
useful information, but does not represent the actual item. 

(iii) It can hardly be located by inspection. 
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Geometric Mean (G.M.). 


Definition: The geometric mean (G) of the n positive values of 
a variate 1 2, Wg, ..., Xn is the m root of the product of the values, 7.¢., 


G="/a1. @e.... an. It means, 


oy 
Ae G=(e1 + a9" +" an)". Now taking logarithms on both sides we 
nd, 


log G== log (ay. @g. ***. @n) =~ (log #1 tlog wg +**:+log an) 


= is log a...(1) -“. G=antilog [4 108 a 


Thus, from formula (1) we find that the logarithm of the 
G.M. of @, @o, ***; n= A.M. of logarithms of 1, ve, °°"; &ns 


Properties. 
1. The product of » values of a variate is equal to the n-th 
power of their G.M. 
1.0.4 @4.@q.*** Xn =G" (it is clear from the definition). 


9. The logarithm of G.M. of m observations is equal to the A.M. 
of logarithms of » observations. [Formula (1) states it]. 


8. The product of the ratios of each of the observations to the 
G.M. is always unity. 
Taking G as geometric mean of observations 71, @e,..., an the 
ratios of each observation to the geometric mean are 
G1, a, on, 
G@ac°'G 
By definition, G="/gy.@....¢n or G"=(e3. t.... tn). Now 
the product of the ratios, : 


Gi Gs... fn Wa. Be... on GY 


Gigs 7G acantien tines Gs 


4. If Gi, Ge, ... are the geometric means of different groups 
having observations m1, %e, ... respectively, then G.M. (G) of composite 
group is given by 

G=VGy" =~ Gg", 2° 
where N=. ++ + 


ie., log G=% [ns log Ga +e log Gate I 
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Example. 
Find the G. M. of 111, 171, 191, 212. 
If G indicates the G. M. of the numbers, then 
G@ = 4/111 x 171 X 191 x 212 here n=4 
log G=% (log 111 +log 171+ log 191 + log 219) 
= (2°0453 + 9°2330 + 2°9810 + 2°3263) 
=} (8'8856) =9'2914 
G=antilog 2'2214 =166'5. 


Weighted Geometric Mean. 


i Computation of geometric mean of numbers when they aro 
weighted respectively, will be cleared from the following example. 


Example. 


; Find the G, M, of 111, 171, 191, 212 having weighted by 8, 2, 4, 
5 respectively. ; 


flog 

111 6°1359 
171 4°4660 
191 9°1240 
212 11°6315 

” Total | 31'3574 


= Sf log @ _ 313574, 
log @ Sf ag 22891 
G=antilog 2°2391 =173'4. 


Example. 


The weighted geometric mean of the four numbers 8, 25, 17 

and 30 is 15°3. If the weights of the first three numbers are §, 3 and 4 
respectively, find the weight of the fourth number, 

[ LO.W.A. Jan. 1971 ] 
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Taking 7. as the weight of the fourth number 80, we find, 
et Be EE 


£ if log a flog x 
8 5 0°9031 4°5155 
25 3 13979 4°1937 
17 4 1°2304 4°9916 
30 te 14771 14771 f, 
12+f. 13°6308+1°4771 fa 
ah ES ly 
Now, log G= ee = 
or iog 1518 ean oe ae 
Sagasive seen tnetit f 


or, (1°1847)(12.+ f4)=13'6308 +1°4771 fa 
or, 14°2164+1°1847 f, =13°6308+1°4771 f, 
or, 14°2164-13°6308 =1'4771 f, -11847f, 


or, “S856= "2004... f= BOB ag, 


Advantages and Disadvantages of Geometric Mean. 


Advantages : 


(i) It is not influenced by the extreme items to the same 
extent as mean. 


(ii) It is rigidly defined and its value is a precise figure. 


(iii) It is based on all observations and capable of further 
algebraic treatment. 


(iv) It is useful in calculating index numbers. 


Disadvantages : 
(i) It is neither easy-to calculate nor it is simple to under- 
stand. 


(ii) If any value of a set of observations is zero, the geometric 
mean would be zero, and it cannot be determined. 
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(iii) If again any value becomes negative, geometric mean 
becomes imaginary. 


Uses of Geometric Mean. 


1. It is used to find average of the rates of changes. If the 

prices, for the years 1970 to 1972, be increased by 5%, 7%, and 12% 
“ ¥ ‘ eM 5+7+12 

respectively, then average annual increase is not 8% a agit 8 
as determined by mean, but 7°5% (G.M. of 5, 7, 12) as determined by 
geometric mean. Geometric mean is useful in measuring growth of 
population, as population increases in geometric progression. 

2. It is considered to be the best average for the construction 
of index numbers. 
Harmonic Mean (H.M.). 

Definition: The Harmonic Mean (H.) for observations 71, #2, 


***, @m is the total number divided by the sum of the reciprocals of the 
numbers, 


ia, H.=>———__ = 
25 re i eee + i Zz 1 
1 fe Ln « 
i 
hea 1_“«@ (ie, reciprocal of H.M.=A.M. of reciprocals 
Bare EL iy 9 of the numbers). 
Example. 
Find the H.M. of 8, 6, 12 and 15. 
4 
anata pig byl 20+10+5+4 
8 6 12 15 60 


Example. 
Find the H.M. of 1, 4, 2,... 1 
2 3 n 
HM. n Qn 


papel de ae SEL Re 2 
[4943+ -- = mye 
2+38 +n ® (9+n-1) nn+1) n+t1 


Note; The denominator isin A.P., use S= Fj 0a+(n—1)d}. 
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Example. 


A motor car covered a distance of 50 miles four times. The first 
time at 50 m.p.h., the second at 20 m.p.h., the third’ at 40 m.p.h., 
and the fourth at 25 m.p.h. Calculate the average speed and explain 


the choice of the average. [ G.A. Nov. 1967 ] 
4 
Average Speed (EM)=——~ Fy“ aU E DET 
507207 40 * 95 1000 
1000 8 ‘ 
= 4 x HD SY = 99°63 = 80 (app.) m.p.b, 


For the statement x wnits per howr, when the different values of 

a (i.e., distances) are given, to find average, use H.M. If again hours 

a Cn, time of journey) are given, to find average, we are to use AM. In 
the above example, miles (distances) are given, so we have used H.M. 


Weighted H.M. The formula to be used is as follows : 


N: 
HM. Reyne where Sf=N 


1 @e &n 2 


Example. 


(a) A person travelled 20 k.m. at 5k.m.p.h. and again 24 k.m. 
at 4 k.mvp.h., to find average speed. 

(b) A person travelled 20 hours at 5 k.m.p.h. end again 24 hours 
at 4 k.m.p.h., to find average speed. 


(a) We are to apply H.M. (weighted) in this case, since distances 
are given. 
Average speed _ 20+ 24 _ a4 Abe 
(HM) 20,24 4+6~ 107 ft kmpb. 
rg 4 
(b) We are to apply A.M. (weighted), since times of journey are 
given. 
Average speed _ 20x 5+24x4 _100+96 _ 196 
(A.M.) 20 + 24 44 44 


=4°45 (opp) 
m.p.h. 
Advantages and Disadvantages of Harmonic Mean. 
Advantages : 


(i) Like A.M. and G.M. it is also based on all observations. 
(ii) Capable of further algebraic treatment. 
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(ii) It is extremely useful 


while averaging certain types - of 
rates and ratios. 


Disadvantages : 


@) It is not readily understood nor can it be calculated with 
ease. 


(ii) It is usually a value which m 
given set of numbers. 


(iii) Ié cannot be calculated whi 
Positive values in a series 


ay not bea member of the 


en there are both negative and 
Or, one or more values is zero, 


Relations between A.M., G.M. and H.M. 


(1) The Arithmetic . Me; 


; 
an is never less than the Geometric 
Mean, again Geometric Mean i 


$ never less than the Harmonic Mean. 
[I. O.W.A. June '79] 


46s A.M. > G.M. > HM. | 
For the observations @1 and we, we know 
(Jes — Vea)" > 0 ‘or, 21 409=2 Jer 0 { 


sea + 4 
Cte ia Smale elaeat Ore” 5 oe OA We G10. 
This is for two observations only. Similarly for the other observa- 


O25 @, +a. A 
tions as, 24 we can show ee > asx. Again for the observations 


cas “ #2 ond mae we can show (similarly) 
Tittetas Hoey pf titta te tia, 
4 2 2 
‘ Tit@etgter — ' 
As Aipie: Paet oe Wes mn werne, 


itxe +astay 
4 > 


Meat, eva: 


| 
+¢o+25+ ———_ 
or Si tte testes 
“. AM. > GM. (this is for four observations), 
In this way, for any number of observation, we haye A.M.>G.M. : 
Again for x and ee (observations) : 
%y Ze , m 
wD Ss anny BF peal 2 
( @4 a) #0; or, ti a7 Naz, 7° 
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DS ae 4 2 — 2 
or, ae tne > age or, Nats > aa 
Zi Xe 


or, G.M.>H.M. (This is true for any number of observations) 
*. AM, >G.M. > HM. 


AM, _G.M. 
G.M. H.M. 
or, (G.M.)?=A.M. x HM. 


1.@., the Geometric Mean for a pair of observations is the 
geometric mean of their Arithmetic and Harmonic Means. 


Median. 


Definition: If a set of observations are arranged in order of 
magnitude (ascending or descending), then the middle most or central 
value gives the median. 


Median divides the observations in two equal parts, in sucha 
way that the number of observations smaller than median is equal to 
the number greater than it. It is not effected by extremely. large or 
small observations. Median is, thus, an average of position. In certain 
sense, it is the real measure of central tendency, 


(2) For a pair of observations only, 


Calculation of Median, 
(A) For Series of Individual Observations* : 


At first, the given data are to be arranged in order of magnitude 
(ascending or descending). Now for » (the total numer of items) odd, 


n+1 


2 th item and for even, 


median = value of 


Lis 
2 


median = average value of ~ th item and nt th item. 
(i.e., the next item) 


Note. net th item gives the location of median, but not its magnitude. 
Example. 


To find the median of the following marks obtained by 
7 students : 


4, 12, 5 9; 14, 17, 16 


* Individual observations are those observations (or variates) having no 
frequenoies or frequency is unit in every case, 
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(i) Arrangement of marks: 4, 7, 9, 19, 14, 16, 17 
(ii) m=7=an odd number 


(iii) Median =value of ne lth item 


T+1 
poe 
=» » 4th ” 


= 12(from the arranged data) 


h 


= » » 


median mark is 12. 


Example. 
To find the median of marks : 
USA? ata 9,. 14.) yaar 
(i) Arrangement: 4, 7, 9, MMe kee, i) 17;, BL 
(ii) =8=an even number. 


(iii) Median =average value of qth item and the next item 
= ” » th > n ” » 
= » > » 4th oo» » » 5th» 


=average value of 12 and 14 marks -eu 


=13 marks. 


(B) For Discrete Series (Simple Frequency Distribution) : 


Cumulative frequency (less than type) is calculated. Now the 
value of the variable corresponding to the cumulative frequency 


Nu gives the median, when N is the total frequency. 
. Example. 
To find the median of the following frequency distribution : 


as #:2-~1—9- 8-4-5 g 
ic f: 7 12 17°19 OT” 94 
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a f va Now, pelea ge Neth item, 
7] 7 100th ” 
12 | 19 =» » 650°5th » 
17 | 36 From the last column, it is found 50°5 


is greater than the cumulative frequency 
36, but less than the next cum. freq. 55 
corresponding to e=4. All the 19 items © 


a on fF wo DoH 
b 
© 
o 
Oo 


21 76 (from 87 to 55) have the same variate 4. 
And 505th item is also one of these 19 
24 | 100(=N) item. .°. Median=4. 


(C) For Continuous Series (Growped Frequency Distribution) : 
We are to determine the particular class in which the value of 


: + : 
the median lies, by using the formula 4 (and not by Nh, as in 
continuous series x divides the area of the curve into two equal 
parts), After locating median, its magnitude is measured by applying 
the formula of interpolation given below— 


Ig- 


Median =1, + (m=): where m=X. 


Where, J, =lower limit of the class in which median lies, 
le = upper limit of the class in which median lies, 
{=the frequency of the class in which median falls, 
m=middle item (j.¢., item at which median is located 
or x th item), 


c=cumulative frequency of the class preceding the 
median class. 


Note. The above formula is based on the assumption that the frequencies of 
the class-interval in which median lies are uniformly distributed oyer the entire. 
class-interval. 
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Example. 
’ Find the median and median-class of the data given below : 


Class-boundaries re 25—35 35—45 | 45 —55 55—665 | 65—75 


4 11 19 | 414 0 2 
(I. C. W. A. Jan. 1965 J 


ca TR | 


Frequency 


eee Frequency eee 
15—95 4 4 
25—35 il 15 
85—45 19 34 
45—55 14 48 
55—65 0 48 
65—T75 2 50(=N) 


Median = value of Min item = value of th item 


» 4, 25th item, which is sreater than cum. freq. 15 
and less than cum, freq. 34, So median lies in the class 85—45, 


Now, median =/, + moh (m-<), where 14 =35, lo=45, f=19 
m=25, c=15 
45 — 85 
35 aR ee (25 - 15) 


a 35 120 toe gg 4 WON ovo 8 
35 +7510 385 + 19 35 + 5°26 = 40°96 
reqd. median is 40°96 and median-class is (35 — 45), 


Example. 
Compute the median from the following data : 
Mid-value Frequency Mid-vailue Frequency 
aioe 6 165 60 
125 25 175 38 
135 48 185 22 
145 72 195 Bi 
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At first we are to find the class-boundaries from the mid-values 
given, 
ee 


Class- | Cumulative 
boundaries Frequency | Frequency 
110—120 6 6 
120—130 25 31 
130—140 48 79 
140—150 72 151 
150—160 116 267 
160—170 60 827. 
170—180 38 365 
180—190 22 387 
190—200 3 890(=90) 
eel 
Median = value of Mtn item = value of a item 
= value of 195th item, so median lies in the class 
(150—160). 
Again, median=1, + bs 7 he (m-c), 


1, = 150, J =160, f=116, m=195, c= 151 


160-150 


=150+ 16 


(195 — 151) 
10 ‘ 79. 
=150 +6 * 44=150+ 3°79 =153°79 = 153'8 (app.) 


Example. 


The following is the Table which gives you the distribution of 
marks secured by some students in an examination : 


Marks : O—20/21— 30 a eam 

No. of students | 421 38 | 190 | 84 | 4g | 36 | 91 

Find: (i) median marks, (ii) the percentage of failure if the 
minimum for a pass is 35 marks, [C. A. Noy, 1969 ] 
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We are to make the class-boundaries from the class-limits given 
and then to find the cumulative frequency. 


—_—_—_—_—_—_—_ ___ 


4 aR 
Class- Frequency |Cumulative| Median =value of qth item 
boundaries Frequency 
= » » 1995 » 


Median lies in (30°5 — 40°5) 
Now, !1 =30°5, Je =40°5, 
f=120, m=199'5, c=80 


0—20°5 42 42 
20°5—30'5 38 80 
30°5—40°5| 120 200 
40'5—50°5 84 284 


Median =1, +22 *¥(m=¢) 


40°5 - 30°5 


50'5—60'5 48 339 =30'5+ 755, (199°5 — 80) 
60°5—70'5 36 368 =30°5 + » %119°5 
70°5—80'5 

= 305+ 9°96 = 40°46, 


The mark 35 represents the interval (84°56 — 35°5) taking marks as 
& continuous variable. Minimum pass mark is 84°56. Number of 
students (F) obtaining less than 34’5 is the cumulative frequency 
corresponding to 84°5 marks. Now using the above median formula, 
putting respective values, we find, 


. «x , 40'5 — 30°56 a 
34°65 SOI raniad (F-80) 
‘ « 10 
or, 34°5 305 +755 (F- 80) 


3905 2 
or, 34°5-30°5 19 (F-80) 
F-80 


us app 
or, F-80=48 
P=128 


reqd. percentage of failure = ie x 100 =32°08%. 


Note. In the first class, 0 is taken as lower boundary as there cannot be any 
number less than zero, 
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Calculation of Median when Class-intervals are unequal. 


In such cases, the frequencies need not be adjusted to make the 
class-intervals equal. Formula of median is to be applied directly. 


Example. 
To calculate median from the following Table : 


Marks : 0—10 | 10—30 | 30—60 | 60—70 | 70—90 
No. of students : 15 25 80 4 10 
von ee 84. 
Marks | f | Cum. freq. Median = value of “9 th item 
ene = » » 49th » 
0—10}15| 15 So, median lies in the class (30—60) 
= 380, le =60, f=30, m=42, c= 
10—30| 95 | 40 Here, J; = 30, Je = 60, f m c=40 
30—60! 30 | 70 “. Median =30 + so 30 x (42 - 40) 
60—70| 4| 74 -30+30 x 9=30+2 
70—90) 10 | 84(=N) sles ea a 


We will get the same result if the class-intervals are made equal : 


Marts | f | Cum. freq. 


Median lies in the class (30—40) 
0—10 | 15 15 
10—20 | 12°5 275 

20—80 | 12°5 40 


1, =30, 12 =40, f=10, m=42, c=40 


30—40 | 10 | 50 Median = 30+ 0-80 (42 = 40) 
4o—50|10 | 60 
50—60 | 10 70 10 

=304 0x o=30+9 


60—70 | 4 74 10 
70—80 | 5 79 
80—90 | 5 84 (=N) = 32 marks. 
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(D) Graphic Method. 
Median can be determined graphically by the following methods : 


(i) Draw less than (or greater than) type ogive, taking the 
variation X-axis and the cumulative frequency on Y-axis. Now 
corresponding to N/2 on Y-axis draw a horizontal line to meet at ogive, 
and again from the point of intersection, perpendicular is now drawn 
on X-axis. Tho point on X-axis is read off, which gives the median. 


Example. ‘To find the median graphically from the following Table : 
30—40|40—50 
12 | 16 


Wages (Rs.) : 


No. of workers : 


10—20/20—30 
5 | 10 


50—60|60—70 cma 
8 5 4 


Wages (Rs.) | No. of | Wages (Rs.) | No. of 
(less than) | workers |(greaterthan)| workers 


20 5 10 60 
30 15 20 55 
40 27 30 45 
50 43 40 33 
60 61 50 17 


AVERAGE 145 


We draw less than type ogive, as shown before. 
Median —size ot Nth item size of 30th item. Now take 80 on 


Y-axis, and from 30 draw a horizontal line to meet the ogive. From 
this point on ogive, draw a perpendicular on the X-axis. The point 
on X-axis is read off. The point is 42, which gives the median. So 
median is Rs. 42. 


Note, If we draw greater than type ogive, we would get the same result. 


(ii) Draw two ogives. From the point of intersection of the 
curves (i.e, ogives), draw a perpendicular to meet the X-axis. The 
point on the X-axis is read off, which gives the median, 


Example. 

The data given in the above example are taken here into 
consideration : 

Draw two ogives (less than type and greater than type). From 
the point of intersection, draw a perpendicular to the X-axis. The 
point on X-axis shows 42. So median is Rs. 42. 


WAGES IN Rs-—> 


Fig. 32 


Advantages and Disadvantages of Median. 


Advantages : 


(i) The median, unlike the mean, is uneffected by the extreme 
values of the variable. 


Bus. Stat.—10 
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(ii) It is easy to calculate and simple to understand, particularly 
in a series of individual observations and a discrete series, 


(iii) It is capable of further algebraic treatment. It is used in 
calculating mean deviation, 


(iv) It can be located by inspection, after arranging the data in 
order of magnitude. 


(v) Median can be calculated even if the items at the extreme 
are not known, but if we know the central items and 
the total number of items. 


(vi) It can be determined graphically. 
Disadvantages f 


(i) For calculation, it is necessary to arrange the data, other 
averages do not need any such arrangement, 


(ii) It is amenable to algebraic treatment in a limited sense, 
Median cannot be used to calculate the combined 
median of two or more groups, like mean. 


(iii) 16 cannot be computed precisely when it lies between 
two items. 


(iv) Process involyed to calculate median in case of continuous 
series is difficult to follow. 


(v) Median is effected more by sampling fluctuations than the 
mean, 


Other Measures (Regarding the median principle). 


It has been seen that median divides an arrayed series in two 
equal parts. Now for further study of composition of a series, it may 
be divided into four, five, six, ten or hundred parts, Usually it is 
divided four, ten or hundred parts, 

Just as we have one median dividing a series in two equal parts, 
80 three items would divide it in four parts, nine items in ten parts 
and ninety-nine items in hundred parts. The values of these items 
are respectively known ag Quartiles, Deciles and Percentiles. Quintiles, 
Septiles and Octiles divide a series respectively in five, seven and 
eight parts, at 

Thus we find three quartiles, ni, 


centiles in a series. The Second quartiles, fifth decile and fifty-th 


parts. Third Quartile or Upper Quartile (Qs) is the’ value of the 
variable that divides the latter half of a serie 


a 


AVERAGE 147 


The calculation of Quartiles, Deciles and Percentiles and other 
such values is done by the same rules applied in calculating the median. 


(A) For Series of Individual Observation. 
The data are to be arranged in increasing order of magnitude : 


1st Quartile, : Q: =size of - = th item 

8rd Quartile, Qs= > ” Sotly, ites 
a cs ; 

lat Decile, Di= ™ » me tth aon 

Tth Decil, Dr= ™ Unt, item 


K-th Decile, De= kody ibe Gone Bivaleri 8x0) 


: 52 4 
1st Percentile Pi= ” ” nr tth item 


, bined item (for #= 1) 2) «»--.-1 98, 99) 


K-th Percentile P, = 


Example. 
To find Q1, Qs, Ds and Peo from the following weights (in Kg.) 
ps ees ad POSSE A a TN Yo a MS ESE 
19, 27, 24, 89, 57, 44; 66, 50, 59, 67, 
62, 42, 47, 60, 96, 34, 57, 51, 59, 45. 


Arrangement : 


Serial no.| Weight (Kg.) |Serial no.| W eig ht (Kg, Serial no.| Weight (Keg.) 
af 19 8 44 15 57 
2 24 9 45 16 59 
3 26 10 47 17 59 
4 27 11 50. 18 60 
5 34 12 51 19 62 > 
6 39 13 56 20 67 
7 42 14 57 
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Here n=20. 


~* 1, item 


Q: (first quartile) =size of melt item=size of 
=size of 5'25th item 
=size of 5th item +4 (size of 6th item —size of 5th item) 
=84+4 (89-34)=34+1'95=35'95 Kg, 


c n+ De item=size of e at Dey item 


=size of 15°75th item 

=size of 15th item + (size of 16th item —size of 15th item) 

=57+ 2 (59-57) =57+1°50 =658'50 Kg. 

4(n+1) 
10 


- Qs (third quartile) =size of 


4 (20+1) 
TO th item 


D, (fourth decile)=size of th item=size of 


=size of 8°4th item 

=size of 8th item + ¥ (size of 9th item —size of 8th item) 

=44 + ty (45-44) = 4444 = 44" Kg. 

60 MH +1), 
100 


Peo (sixty-th percentile) =size of th item 


60 G04 +1), 


=size of th item 


=size of 12°6th item 
=size of 12th item + 3°y (size of 13th item — size of 12th item) 
=51+9%o (66-51) =514+3=54 Kg, 


(B) For Discrete Series. 
i pie os 
Qi =size of Ntly, item (where N is the total frequency) 


» +1 
4 


=» » M4TKth » =50.Kg. 


SiN +1) 


Qs =size of ——~—~th item 


23 8 B(06- Dt sain (eect ehie Table given) 


=» » 74'25th item =60 Kg. 


AVERAGE 149 


Dz =size of 2 on Dey item 


n 4084p item (from the Table given below) 
= » » 39'°6th item=54 Kg. 


60 (N+1) 
100 


Peo=size of th item 


» » 60(08+U),, item (from the Table given below) 


= » » 59°4th item =59 Kg. 


Weight (Kg.)| Frequency ey ; 
40 ; 
49 4 
45 in 
50 i 
51 $2 
54 iid 
56 © 
59 86 
60 80 
62 92 
ae 98 (=N) 


(0) For Continuous Series. 


Like median, the values of quartiles, deciles and percentiles lie 
in various class-intervals and the actual values are to be calculated by 
applying interpolation formulae. 


s 


Qi =size of Mh item 


= 7 4 ra » (from the Table given at page 151) — 


= » »Qist », which lies in the class (86—40) 
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Now, 11=36, 12-40, f=8, 
q (item in which quartile is located) =21, c= 20. 


oe 36 


.) Qual +S 8 qe) 36+ (21 - 20) 


=36+ é =36+'5=365 Ke, 


Qs =size of aN item 


» 2x8 


= » » 68rd » So, Qs lies in the class (52—66), 


= » 


Now, 11 =52, le =56, f= 10, q=63, 0= 62 
* Qahte BSG = = 52 42682 5 (68-62) 
=52+ “4=52'4 Kg, 


«84 
10 

(trom the Table given) 

= 1 1 33°6th 4, So D, lies in the class (40—44), 


Ds =size of tn item = size of es th item 


D,=40+ 


“47% (986-98) =40+ £ x 5:6=4048°7 


=43'7 Kg, 
Poo =size of Sth item 


69 
me 5 eae th ,y (from the Table given) 


» 1 60°4th —,, Peo lies in the class (48—5Q). 


Ws Peg 48+ 9248 (5:4 = 50)=—48+ 4 x (4) 


=48+1'3=49'3 Kg, 
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Weight (Kg.) | Frequency FPN ey 
20—24 2 2 
24—28 3 5 
28—32 5 10 
32—36 10 20 
86—40 8 28 
40—44 6 34 
44—48 16 50 
48—52 12 62 
52—56 10 72 
56—60 7 79 
60 —64 5 84 


a 


(D) By Graphic Method. 


Like median, quartiles, deciles and percentiles can also be 
calculated graphically with the help of cumulative frequency curves, 
known as ogives. The process of estimating quartiles is shown below : 


Example. 
The following are the marks obtained by 123 students in statistics : 
Marks obtained No. of students 

atts 7 

6—10 10 
11—15 16 
16—20 32 
21—25 24 
26—30 18 
31—35 10 
86—40 5 
41—45 1 
Total 198 


—Draw an ogive and locate the first and third quartiles. 
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At first we make the class-boundaries and cumulative frequency 
as shown below : 


rs ce 


Class-boundaries | Frequency oe ey “ 

‘5 55 7 7 
5'5—10°5 10 17 
105—15'5 16 88 
15°5—20'5 32 65 
20°5—25'5 24 89 
25'°5—30°5 18 107 
30°5—35°5 10 117 
35'5—40'5 Beh ae 
40°5—45°5 1 433 


Drawing of ogive (less than type) and estimation of Qi and Qs 
are shown below : 


eis 2 


CUMULATIVE FREQUENCY ———> 


$6 8 $ 
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We find from the above drawing, 
Q1=14'7 marks ; Q;, =26°5 marks, 
Mode. 


Definition: Mode is the value of the variate which occurs 
most frequently. It represents the most frequent value of a series. 

When one speaks of the ‘average wage’, ‘average student’, etc., 
we generally mean the modal wage, the modal student. If we say 
that the modal wages obtained’ by workers in a factory are Rs. 70, 
we mean that the largess number of workers get the same amount. 
As high as Rs. 100 and as low as Rs. 50 as wages are much less fre- 
quented and they are non-modal. 


Calculation: Mode cannot be determined from, a series of 
individual observations unless it is converted to a discrete series (or 
continuous series). In a discrete series the value of the variate having 
the maximum frequency is the mode. In continuous series, the 
class-interval having the maximum frequency is the modal class. 
However the exact location of mode is done by interpolation formula 
like median. 

Location of modal value in case of discrete series is possible if 
there is concentration of items at one point. If again there are two 
or more values having same maximum frequencies (i.e., more concentra- 
tions), it becomes difficult to determine mode. Such items are known 
as bi-modal, tri-modal or multi-modal according as the items concen- 
trate at 2, 3 or more values. 


(A) For Individual Observations. 


The individual observations are to be first converted to discrete 


series (if possible). Then the variate haying the maximum will be the 
mode. 


Example. Oalculate mode from the data (given) : 
(Marks): 10, 14, 24, 27, 24, 19, 11, 17. 
a 
Marks Frequency 


10 
11 


‘ie (Individual observations are 


converted into a discrete series) 


= 
~ 
PpprP RRP ee 
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Here marks 24 occurs maximum number of times, #.¢., 2. Hence 
the modal marks are 24, or, mode =24 marks, 


Alternatively : 
Arranging the numbers : 10, 11, 12, 14, 17, (24, 24), 97. 
Now 24 occurs maximum number, i.e., 2. 
+. mode=24 marks, 
Note. When there are two or more values having the same maximum 


frequency, then mode is ill-defined. Such a sense is known as bi-modal or multi-modal 
as the case may be, 


Example. 
Marks obtained: 24, 14, 20, 17, 20, 14. 
paris DAOC Poe ites 
Marks Frequency 
Here 14 occurs 2 times (max.) 
i 3 and 20 occurs 2 times (max.) 
oi : mode is ill-defined, 
20 2 
24 1 
(ORE a elec iad 


(B) For Discrete Series. 
To find the mode from the following Table : 


Height in inches: 57 59 61 62 68 64 65 66 67 69 
No. of persons : 3 5 7 10 20 92 94 5 9 2 


Frequencies given at page 155, in column (1) are grouped by two's 
in column (2) and (8) and then by three’s in columns (4), (5) and (6). The 
maximum frequency in each column is marked by Bold Type. We do 
not find any fixed point having maximum frequency but changes with 
the change of grouping. In the following Table, the sizes of maximum 
frequency in respect of different columns are arranged. 
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GROUPING TABLE 
Frequency 


Height in 
inches (1) (2) (3) (4) (5) (6) 


67 8 
59 5 


66 5 | 31 
7 


a 15 
61 7 
ia 29 
62 10 } 87 
| 30 52 } 
42 
64 22 wi 
| 46 
65 24 
29 | 51 


ore 


ANALYSIS TABLE 


Size of item having 


Column maximum frequency 

1 65 

2 63 | 64 

3 | 64 | 65 

4, 62 H 63 64 

5 63 64 65 

6 64 | 65 | 66 

No. of times| 1 3 5 4 1 


From the above Table, we find 64 is the size of the item which is 
most frequented. The mode is, therefore, located at 64. : 
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Note, At a glance from column (1) one might think that 65 is the mode 
since. it contains maximum frequency. This impression is corrected by the process 
of grouping. So it is not advisable to locate the mode merely by inspection. 


(C) For Continuous Series. 

By inspections or by preparing Grouping Table and Analysis 
Table, ascertain the modal class. Then to find the exact value of mode, 
apply the following formula : 

Mode=/+ firth x 4, 
1 
where = lower class-boundary of modal class, 
#1, =frequency of modal class, 
fo = frequency of the class preceding the modal class, 
fa = frequency of the class succeeding the modal class, 
i=size of class-interval of modal class. 


Example. 


Calculate mode from the following data : 


Marks No. of students Marks No. of students 
above 10 59 above 50 18 
Hi 20 54 i 60 8 
30 46 eh/70 0 


40 


34 


We are to convert the cumulative frequency distribution into a 
simple frequency distribution. 


__— 


Marks No. of students 
10—20 5 
20—30 8 
30—40 | 12 ‘ 
40—50 16 
50—60 10 
60—70 8 
-_e-—o—oo 


The modal class is (40—50), since the max. frequency is 16, Here, 
1=40, fo =12,f.=16, fo =10,i7=10 
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sy fi-fo 16-12 
Mode UF on SET 39-12-10 * 


= 40+ 5x 10=40+4=44 marks, 


xi=40+ 10 


Location of Mode Graphically. 


In case of the Frequency Distribution, Mode can be located 
graphically. 

Draw a histogram of the data given. In the inside of the modal 
class-bar, draw two lines diagonally starting from each upper corner of 
the bar to upper corner of the adjacent bar (as shown in the next 
figure). Now draw ® perpendicular from the point of intersection of 
the diagonal lines to the X-axis. The point on the X-axis is read off, 
which gives the modal value, 


Example. 


The monthly profits in rupees of 100 shops are distributed as 
follows : 


Profits per shop : 
Sa gea tat ton no — #00400 5001500609 
20 17 6 


No. of shops : 12°) 18 27. -| 
Draw the histogram to the data and hence find the modal value. 
Check this value by direct calculation. [ L.C.W.A. Jan. 1964 ] 


Histogram showing the distribution of profits 


Fig. 34 
From the graph, Mode is found to be Rs. 256 (app.). 
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Now for direct calculation, we find modal class is (200—800), 
since the class has got the highest frequency. 


Again 1=200, fo=18, f:=27, fe = 20, i= 100 


Modeete Bae x i= 9004-218 


fs —fo—fs &4-16-90 * 1° 


=200 + 5 x 100= 200 + 56°25 = Rs, 256°25. 


Calculation of Mode when class-intervals are unequal: If the class-intervals 


are unequal, then we are to make them equal, having frequencies adjusted, Then 


the formula for computing the value of mode is te be applied. 


Advantages and Disadvantages of Mode. 
Advantages : 
‘(i) It can often be located by inspection. 


(ii) It is not effected by extreme values. It is often ® 
really typical value. 


(iii) It is simple and precise. It is an actual item of the series — 


except in a continuous series. 
(iy) Mode can be determined graphically, unlike Mear. 


Disadvantages : 
(i) It is unsuitable for algebraic treatment. 


(ii) When the number of observations is small, the Mode may 
not exist, while the Mean and Median can be calculated. 


(iii) The value of Mode is not based on each and every item of 
series. 


(iv) It does not lead to the aggregate, if the Mode and the total 
number of items are given. 
Empirical Relationship between Mean, Median and Mode 


A distribution in which the values of Mean, Median and Mode 
coincide, is known symmetrical and if the above values are not equal 


then the distribution is said asymmetrical or skewed. Ina moderately — 


Tevet there is a relation amongst Mean, Median and Mode which is a8 
ollows : 


Mean - Mode =3 (Mean — Median). 
If any two values are known, we can find the other. 
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Example. 
In a moderately asymmetrical distribution the Mode and Mean 
are 32°1 and 85°4 respectively. Calculate the Median. 
[ Delhi, B. Com. 1966] 
From the relation, we find 8 Median=2 Mean+ Mode 
or, 3 Median=2x 35'4+32'1=70'8+32'1=109'9 
i Median = 34'3, 


Which Average is to apply ? 

For all circumstances, no one average can be regarded as best, 

A.M. should be avoided in cases of skewed distributions, open- 
end intervals, for averaging speeds and for extreme items. 

G.M. is to applied for construction of index numbers, for com- 
puting average rates of increase or decrease. 

HM. is useful for finding rates, time, ete. 

Median is the best average in open and grouped distributions, in 
case of price or income distributions. 


Mode is © particularly useful average for discrete series, 4.c., 
number of persons wearing a given size of shoe or number of children 
per household. For a very large frequency, Mode is suited best. 


More Examples. 

1. The following data relate to the weights of 90 persons. You 
are required to form a frequency distribution with class-interval 
10 pounds like 100—110, 110—120, ete., and hence compute the Mean, 
Median, Quartiles and Mode. 


134)156 121 169. 168 1857. 112 4195 © 194 
186 122 «187 = «167 «1183S 107s Ss14Q~Ss«87 
1035 APB: BL" 189" *197* 167" 798 2° eR aa 
136 «110, 162 119° 146 153-149 196 173 
145 180 169 4117 141 144 1165 156 166 
V7 185188. 168.142. 116 1: 195, 148 ~—-109 
176 «6120 «147 «6188S 182187 148487 
164178. 182 185. 186’ 1499. a70. | 189. ae aa 
140 «155° 115 147-187" 147° 8499" 449 180 
146-149-198: 160., 188, 104 181, 4181 .. 148 
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Frequency distribution of Weights and Computations of 
Mean, Median, Quartiles and Mode. 


jig f 7 4 7 v 
a off Mid. ‘i we a’=aho fa Cin. free 
Cay Ea); C8) (5) (6)=(2) x (5) 
100—110 | 4 105 ~4 |... -16 4 
110—1920 | 7| 115 -3 -21 11 
120—130|10| 195 -2 — 20 1 
130+140 | 15 | 185 =| -15 86 
140-150 |19 | 145 0 0 55 
150—160 | 13 155 vt 13 68 
160—170| 9] 165 20 2 18 717 
170—180| 6.) 175 30 3 18 83 
180-190} 4; 185 40 4 16 87 
190—200 | 3 195 50 5 15 90 (=N) 
ALANS TROLIVE| P| ashi 
Total 90 — rors aie 8 
Let A (assumed Mean) = 145. 

Am.~ A424 x j-145+8 x 10 =145 +'89 = 145'89 Ibs. 


Median = size of Nth dtsui= sine’ of th item 


=size of 45th item. Median lies in the class (140—150). 
Here, 1, =140, 1, = 150, f, =19, m=45, c= 36. 
Medion = 1s +185 18(m—~ 0) = 140+ 190 ™M0(as - 36) 
t a 


= 140 +4 %9=14044°74=144°74 Ths, 


Q 1 (1st quartile) =size of ph item =size of 22°5th item. 


Now, Q:; lies in the class (1830—140) _ 
Here, 11 =130, 7, = 140, f, =15, q= 225, c=21. 
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140- al 


Qual. +25 F(q—0)=190 + OE (aar5~ 21) 
1 


10°, 08 
= 130+ 55 x 1'5= 130 ths. 


Qs (8rd quartile) =size of aR th item = size of 675th item. 


Now Qs lies in the class (150—160). 
Here 11 = 150, J. =160, f: = 18, g=67'5, c=55 


Qanta + 8-H (q- 0)—150 een Ane 


Say 12°5 = 150 + 9°62 = 159'69 tbs, 


The modal class is (140-150). 
Here, 1=140, fo =15, f: =19, fo=18, i=10 


es THES Ae _ 419-15 
setae gee gs Te eaek gems 95 2 


-140+ 4 x 10=140+4=144 Ibs. 


(67°5 — 55) 


2. An incomplete distribution is given below : 


Variable | Frequency 
10—20 12 
20—30 "80 
380—40 ? 
40—50 65 
50—60 2 
60—70 25 
70—80. 18 

Total 229 


You are given that the median value is 46. 


(a) Using the median formula fill up the missing frequencies, 
(b) calculate the Arithmetic mean of the completed Table. 
~" [C. A. May 1968 ] 


Bus. Stat—11 


161 


162 


BUSINESS STATISTICS 


Let the frequency of the class (30—40) =f, 
and » ” > » 0» (50—60)=/.. 
Now 12+30+/,+65+f, +25 +18=9299 
or, fitfe=229-150=79, 
Median =size of Nth item = sizeof 708 item 
=size of 114'5th item, 
Now median lies in the class (40—50), since median = 46. 
Here, J, =40, /2=50, f=65, m=114" 5, om 4B fa: 


From, median =/, als F (m- —c), we get, 


46 = 40+ 2040 1114°5 -(19430-4/,)} 


or, 46=40+ st 5-f,) 
or, 46-40= BY o (79:5 —fx) 
or, 6= 179'5—f) =f) 


or, f,=33°5 =34 (app.) 
‘. fa=79-f,=79-34=45 
f:=84 and fo=45. 


For Computation of Arithmetic Mean 


Variable | Mid-value 
© 
10—20 15 
20—80 25 
380—40 35 
40—50 45 
50-60 55 . 
60—70 65 
70—80, teMBs: : 
ne eobale lg Were, 
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Let A=45. 


AM. ess x i=45 +29 x 10=454'83 = 45°83 (epp,). 
af 229 
8. The Table below gives the diastolic blood pressure of 250 men. 
The readings were made to the nearest millimetre and the central 
value of each group is given : 
Blood pressure (mm): 60 65 70 75 80 85 90 95 
Number of men 5. 45,6. 81 89-114, 30..96- 2 


Calculate from the data the mean and median. 
[ 1.0.W.A. July 1970] 


For calculation of median we are to form the class-boundaries 
from the mid-values given. The common difference between the 
mid-values, indicates the class-boundaries will be of equal width. 


Calculation of Mean and Median 


Class- Cum. 


Mid-values boundaries, f Frequency 
60 57 E—62" 5 4 —16 4 
65 62°5—67'5 5 -15 9 
70 67'°5—12'5 31 -62) 40 
75 72°5—175 39 -$91 79 
80 77 5—82'5 114 0 | 193 
85 82'°5—87'5 | 30 30 | 223 
90 87°56—92'5 25 50. | 248 
95 92'°5—97'5 2 6 | 250(=N) 
Total ae 250 -—46 
Let A=80, A.M.=A+ 218 x 
Gelatin Wee 
=80 +*3.,- 250 x5 =80-"92=79'08 mm. Aes 


Median = size of Neh item =size of at =195)thitem. | ~ 
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Median lies in (77°5—82'5), 11 =77'5, 1g =82°5, f; = 144 
Median =7, + aes (m=) 
a 


"5 2 (195 — 
= 175 + F74 (125-79) q 


=17'5 + x 46=775+2'02=79'52 mm. 
114 
4. Given below is the distribution of 140 candidates obtaining 
marks X or higher in a certain examination (all marks are given in 
whole numbers) : 


Xi,10.. 20 30 40 50 60 70 80 90 100 
ef.:140 1383 9118 100 75 45 25 9 2 0. 
eee 


Calculate the mean marks obtained by the candidates, 
[I.0.W.A. Inter. June 1975] 


For calculating mean at first we are to transfer the cumulative” 
frequency distribution (given in greater than type) in the form of a 
frequency distribution as follows and hence to apply the usual formula. 


Frequency ‘Distribution and Calculation of Mean 


; Frequency Mid.-pt. , 7 
lass~ i 
Class-intervals () (a) d d= 0 fd 
10— 20 - 7 15-40 ae) > S98 
20— 30 15 2 -30 -3 -45 
30— 40 + * 18 35° 90 889 -g = A'gg 
40— 50 © 95 | 45 ©=10 a 
50— 60 : 80 55 0 0 ) 
60— 70 20 65 10 1 20 
70— 80 16 15 20 2 32 
80— 90 7 85 380 3 a1 
90—100 2 95 40 4 8 
Total 140 — ne = - 653 
Let A=55 


ee é 
AM.=A aes XG=65 +558), 10=55-3°78=51'99 marks, 
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5. The numbers 3'2, .5°8, 79 and 4'5 have frequencies 2, (7 +2), 


(e-8) and (~+6) respectively. If the arithmetic mean is 4'876,.find 
the value of z. { C.U. M. Com. 1973 ] 


Calculation of the value of x 


Numbers Frequency fe 
£ f 
3'2 x 3°22 
5'8 +2 5°82 +116 
79 2-3 79e -23°7 
45 +6 45a +270 
Total 4e+5 91°47 +149 
_3fo 
Now A.M. 7, 
. 214 +149 
or, 4876 aw oor pes 


or, 4'876(4a +5)=21'4e+14'9 
or, 19'504¢ +24°380 =21'47+14'9 
or, 1°896¢=9'480 


or, © = 7-596 72: 


6. Put the following information into a frequency distribution 
and obtain the arithmetic mean (assuming the range of salary is 
Rs. O—500) :— 


For a group.of wage-earners, 20%, 40%, 70% and 80% of the: wage- 
earners receive less than Rs. 50, 120, 800 and 350 respectively ; and 
5% are receiving Rs. 400 and over. 

[ 6.U..B.A. (Heon,) 1965 ] 


From the question it is clear that. balance 15% of the wage- 
earners will lie in the group of Rs. 350 and less than Rs. 400. 


From the cumulative percentage distribution as shown in the 
first table (below), we form the grouped frequency distribution as shown 
in the next Table : ‘ 
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Cum. Percentage Distribution Frequency Distribution of Wages 


Cum. 


Percentage of 
(in Rs.) | percentage 


wage-earners 


Salary 
(in Rs.) 


0— 50 
50—120 
120—300 
300—350 


350—400 
400—500 


Total 


Taking percentage of wage-earners as weights, calculation of 
arithmetic mean is shown in the Table below :— 


Salary Mid-value 1 2 0 
(in Rs.) x d= als fa 
0— 50 25 20 | -800 — 60 — 1200 
50—120 85 20 | -—240 - 48 — 960 
120—300 210 30 | -115 -23 — 690 
300—850 825 10 0 0 0 
350 —400 875 15 50 10 150 
400—500 450 5 125 25 125 
Total 


Let = 325 


Esfyre chorea (= 2575) 
AM.=A+ Sf X7=395 + 100 xb 
= 325 -198'75 =Rs. 196'95, 


7. (a) The following frequency distribution is with regard to 


weight in gm. of mangoes of a given variety. If mangoes of weight 
less than 443 g 


ms. be considered unsuitable for foreign market what is — 
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the percentage of the yield suitable for it ? Assume the given fre- 
quency distribution to be typical of the variety. 


pveieht | 410—419 | 490—429 | 430439 | 440449 | 450-459 
Frequen. : 14 20 42 | 54 oe 45 
; | 460—469 | 470—479 | Total 
[ere hate ecw F 200 


(b) Draw an ogive of ‘more than’ type on the data of the above 
question. Deduce from it the median of the distribution. 
[LO.W.A. July 1968 ] 


SSS 


(a) 5; 
Cumulative 
Weight (gm.)| Frequency, frequency 
410—419 14 14 
420—429 20 34 
430—439 42 42 
440—449 54 130 
450—459 45 175 
460—469 18 198 
470—479 7 200 
Total 200 


— 


Let «=total number of mangoes of weight less eign 443 gms. 
which lies in (440—449). 

Now, applying the interpolation formula, we find (taking lower 
and upper class-boundaries). 


- 76) 


443 —499°5 + 249 5= bese ae 


or, 443 - 439°5 = 39 (e- 76) 
_ 100 ~760 
54 ! 
or, 10¢-—760=35 x 54=189 
2=949= 95 (app.) 


or, 35 
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Number of suitable mangoes = 200 — 95 = 105 


Percentage of total yield (suitable) = me x 100 =52°5%, 


Hf Class-boundaries | Frequency ime ha 
409'5—419'5 14 200 
419'5—429°5 20 186 
429°5—439'°5 42 166 
439°5—449°5 54 124 
449°5—459°5 45 70 
459°5—469'5 18 25 
469'5—479'5 7 7 

Se 


Ogive ‘more than’ type is to be drawn, from the above Table 


(drawing is left to the students), and hence estimate median by usual 
process, 


[ For check, median =439°5 + 2 (100 — 70) = 445°06 


=445-(app.) gms. ] 


8. Three groups of observations contain 8, 7 and 5 observations. 


Their geometric means are 8°52 


[L.O.W.A. Dec. ’76]_ 
Here, n,=8, ne=7, ns=5, G,=8'52, Ge=10'12, G,=7°75. 


G. M. (G) of the composite group is 


G=VG,":.Ga"aGa"s, where Gi, Ge, Gy are G.M. of | 
sroups having 1, na, ny observations respectively, 


or, log G= Fa (ma log G; +n log Gs +ng log Gs} 
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winged Fer, 
Ny t+netns 


il : ‘ 
-Bn75 (s log 8'52+7 log 10°12+5 log 7 75) 


or, log G= (ns log Gi +2 log Got+ms log Gs) 


-+ (8 "980447 x1°0058+5 «8898 } 


ap : " 
4 (7 4439 + 7°0364 +4 4465) 


=+ x18: 
= ag * 18'9261 


=0'9463 = log 8°837 
“. G.M. =8'837 =8'84. 


EXERCISE 6 


1. Point out the advantages and disadvantages of the chief 
kinds of averages used in statistics. 

9. What is the difference between simple and weighted 
average ? Explain the circumstances under which the latter should 
be used in preference to the former. 

3, Define the different measures of central tendency explaining 
how each of them can be computed for a given frequency distribution, 

4, In each of the following cases, explain whether the descrip- 
tion applies to the mean, median or both : 

(i) Can be calculatedfrom a frequency distribution with open- 
end classes. 

(ii) "The values of all items are taken into consideraton in the 
calculation. 

(iii) The yalues of extreme items do not influence the average. 

(iv) In a distribution with a single peak and moderate skewness 
to the right, it is closer to the concentration of the distri- 
bution. 


[C. A. Nov. 1965 ] ( Ans. Median ; both ; Median ; Median.) 
5. State and prove the properties of Geometric Mean. 
6. The following are the monthly salaries in rupees of 20 
employees of a firm ; : 
18062 145 118 195 76 151 142 110 98 
65:116 100 103 71 85 80 122 132 95 
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The firm gives bonuses of Rs. 10, 15, 20, 25 and 30 for 
individuals in the respective salary groups exceeding Rs. 60 but not 
exceeding Rs. 80 ; exceeding Rs. 80 but not exceeding Rs, 100 and so on 
upto exceeding Rs, 140 but not exceeding Rs. 160. Find the average 
bonus paid per employee. [O. A. Nov. 1964] ( Ans. Rs. 19) 

7. Calculate the value of median, mode and two quartiles fro 
the following data : 


[L.CW.A. 1966] (dns. Med =40,)Modé=38'64, Qi = 34, Q, = 47°08). 


_ » 8. The frequency distribution below gives the cost of produc- 
tion of sugarcane in different holdings. Obtain the Arithmetic Mean. 


_ Cost Cost Frequency 
OEE daieandeeds 
2-6 “Tt 18— 52 
6— 9 22— 36 
10— 21 26— 19 
14— 47 30—34 3 : 
DE RRMA OLR repeat ca nee 
(Ans. 19°21) 
9. Find the median height of Indian adult males from the 
following frequency distribution : 
Height (cm.) Frequency 
pr mR aera aia ibe: pike eens Cee 
144°55—149°55 1 
149°55—154'55 3 
154°55—159°55 24 
159°55—164°55 58 
164°55—169'55 60 
169'55—174'55 27 
174°55—179'55 . 2 
179°55—184'55 2 


( Ans. 164°76 em. ) 
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10. An incomplete frequency distribution is given below : 


Height (inches) : 
5'1—6'0 61—7'°0 71—8'0 8'1—9'0 9'1—10°0 10°1—11°0 11°1—12'0 
No. of plants : 


3 8 a7 ? 17 11 9 


It is known that the median height of a plant is 853 inches. 


Calculate the missing frequency. 


11. 


distribution is known to be 67°45 inches. 


Height (inches) : 
Frequency : 


[ I.0.W.A. Jan. 1972 ] ( Ans. 25 ) 


The Arithmetic mean calculated from the following frequency 
Find the value of Ss: 


60—62 68—65 66—68 69—71 72—-74 


15 


54 


fs 


81 


24 


[ 1.0.W.A. July 1971] ( Ams. 126 ) 


12, Find the Arithmetic mean and the years in which the modal 
point and the median fall from the following data : 


The No. of persons killed in accident in the coal mines in India 


Year : 
No. : 


1951 
819 


52 
853 


63 


"4 
830 4929 


313) 
809 259 


56 


67 *58. 59 
182 420 212 


Find also where the qaurtiles will lie. 


[ C.U.B. Com. (II) 1966 ] 


( Ans. : Mean =812°56 ; 1954 ; Q, in 1954 ; 1951 ; Qs in 1952 ) 


13, 


Comment on the performance of the students of the three 


universities given below using simple and weighted averages : 
University : 


Bombay 


Calcutia 


Madras 


of No of No of 9 No of 
ied of | % seb students hh students a students 
y |B (in hundreds) ” (in hundreds) ” (in hundreds) 

M. A. 71 3 2 81 2 

M. Com. | 83 4 76 8 76 85 

B, A. 73 | 5 73 6 74 4°5 
B.Bom.| 74 | 2 76 7 58 2 

B.Sc. | 65 | 3 65 3 70 7 
M.Sc, 3 7 73 


(Ans. : 


Simple average : 


Wi. average 


[0.A. Sips 1970 ] 


72 ; 72 
> 72°55; 


70" 1: 72°55 ) 
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14. The table given below has been constructed from data 
obtained from a factory showing the distribution of the number of . 
processed articles per day per person and the rate of payment : 


SSG rc 


Daily no. of articles No. of persons Rate of payment 
Processed per person processing per article processed (P) 
80—99 12 31 
100—119 63 32 
120—189 87 3°3 
140—159 56 34 
160—169 3 35 


Se 


Calculate the rate of Payment per person per article processed. 
[L.O.W.A. July 1964] ( Ans. 3'301 P) 


15. There are two branches of an establishment employing 100 
and 80 persons respectively. Ifthe arithmetic means of the monthly 


16. The mean age of a group of 100 children was 9°35 years. 
The mean age of 25 of them was 8°75 years and that of another 65 
was 10°51 years. What was the mean age of the remainder ? 


[0.U. M.Com. 1965] (Ans. 3°31 yrs. ) 


17. From an income distribution of a group of mean 20% of men 
have income helow Rs. 30, 35% below Rs. 70, 60% below Rs. 150 and 


eras Rs. 250. The first and third quartiles are Rs. 50 and 
s. 170. 


_._ Pub the above information in & cumulative frequency distribu- 
tion and find the median, [G.U. M.Com. 1966 ] ( Ans. Rs. 118 ) 


18. For a certain group of ‘saree’ weavers of Varanasi, the 
median and quartiles of earnings per week are Rs) 44°30, Rs: 43°00 
and~ Rs, 45°90 respectively. 10% of the group ea: 


Tange of earnings per week is Re, 4C—Rs: 60. Put the data into a 
i [0.U. B.A. (Bcon.) 1970 ] 
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19. Find graphically the median value from the following data 
on yield of grain in pounds per 1/500 acre: 


———— 


Yield ee of | yield if nes 
2°7—9°9 4 | 41-43] 69 
29=31-}—15- | aaa + 59 
31-33 | 20 | as—a7| — a5 
33-35) 47 | e749 | 10 
36-37 | 63. | 49-61 8 
3°7—3°9 78, | 51-53 4 
39-41] 98 


Determine the modal value from its approximate (Aen 


with mean and median. 


11 
2 


Ax ROH oS 
or 


11 


ow Om wo PF 


maw r aD 
or ao, ae lo YS) 


(IAS. 1962] ( Ans. 3°95 


20. In a sample survey of 60 workers’ families living in a 
factory area, the following data were obtained, as regards. the number 
of members inthe families. Form a frequency distribution and find 
the mean and median family size. 


10° 6 
6 4 
1 5 

ie Reus 

5 
7 


emp aan 


5 


| [ L.C.W.A. July 1969 ] 
| (Ans, Freq. : 1, 8, 5, 6, 10, 18, 9, 5, 3, 2, 2,1; Mean 
=597 ; Median=6) 


21. The weekly wages earned by the hundred workers of a 
factory are set out in the following table : 
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} 


Weekly wages (Rs.) :| 12°5—17°5 17'5—29°6 | 29°5—97'5 | 97°5—39'5 


No. of workers : Lat 16 25 14 


32°5—37°5 | 37°5—49'5 | 42°5—47'5 475—52'5 52°5—57'5 


18 10 ere bay, 1 
Calculate the three quartiles of the above distribution, taking 
te 2 ana 8 as their ranks. [O.A. Nov. 1968] 


(Ans. Q, =21'56 ; Q2=26'9 ; Q, = 35°58) 
22. The following are the marks obtained by a batch of 20 
students in a certain class-test in English and Mathematics : 


Roll No. BOY BAO Ge BETS 9 10 
Marks in English 53 54 52 32 30 60 47 46 35 28 
Marks in Mathematics 58 55 95° 32°96 85 44 80> 33 72 
Roll No. 11 12 13 14 15 16 17 18 19 20 
Marks in English 25 42 33 48 72 51 45 88 65 29 


Marks in Mathematics 10 42 15 46 50 64 39 38 30 936 


In which subject is the level of knowledge of the students higher ? 
(Gorakhpur, B. Com. 1966] (Ans, Median (Eng.) = 455, 
Median (Math.)=41°5 knowledge of English is higher). 


23. Find the median and mode from the following table : 


No. of days No. of No. of days No. of 
absent students absent students 
less than 6 less than 30} 644 
Aen O $9) ong BO 650 
MPA TLS » w 40 653 
Hea U eae hatha: 55 
oy OB 


[G.A. May 1965] ( Ans. Median =19°75, Mode= 11°36 ) 
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24. From the results of the two colleges A and B, given below, 
state which of them is better and why. 


—_e———r—r—r—rorooo——————————— 


A-College B-College 
Class 

Appeared--|Passed|-Appeared~| Passed 
M. A. 30 25 100 80 
M. Com, 50 45 120 95 
B, A. 200 150 155 70 
B. Com. 120 15 85 50 
Total 400 295 455 295 
——— 


{ Lucknow, B.Com. 1949 ] ( Ans. A-College ) 


3 25. Hind the mean, median and modal ages of married women 
at first child-births. 


SSS TASS a et AN NEL CN ERM NE IF 


Age at the birth 
of Ist child: 18 14 15 16 17/18 19 20 9199 98 94 25 


No. of married 
women : 37 162 343 390 256 483 161 355 65 85 49 46 40 


: [1.0.W.A, Jan. ] ( Ans. 17°72; 18 ; 18 ) 
26. Frequency distribution of weekly wages of 500 workmen : 


\ 


Weekly wages (Rs.) : 


M15 |. 16-90, | 21—25 | 26-—85, | 36—45 | 46—60 
Freq: 2 23 «| (86 154 | 120 75 

61—75 | 76—100| Total 

| 33 | 7 | 600 | 


$$ AAA eee 
Draw an ogive of this distribution and use it to find (a) the 
median wage, (b) the wage limits of the central 50% of the wage- 
earners and (c) the percentage of workmen earning more than 
Rs. 32°60 per week. (1.C.W.A. Jan. 1967 ] 


( Ans. (a) Rs. 34°53, (b) from Q: = Rs. 26°41 to Qs = Rs, 44°67, (c) 56%] 
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faxailies i ina certain country during the year 1957. Find the median / 
the third quartile; and the second decile of the distribution. Check 
the results by the graphical method. 


-eoOooo 


Age of head of family Number 
(years) (million) 
under 25 2°3 
25—29 41 
30—34 5°38 
35—44 ' — 106 
45—540) 97 
564 | gg ail 
65=74 44 
ei above (74 é 1'8 
Total . 45:0 
SS 


: [TO Wish ta. 1973 ] (Ans: 44°71 yrs, ; 57°1 yrs. ; 81°95 yrs.) 


28. Below is given the frequency aistribution ot weights of 8” 


group of 60 students in a class in a school : 


We. (Ke,):) 80-84 | 35—39 | 4044 | 45—49 | 50-54 | 55-59 

No.of | 3 5 “19 8 

Students : a : e : 

60—64 
1 2 


, @). Draw histogram for this distribution and find the modal 
value. a ; A at 


j 
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(b) Prepare the (1) cumulative frequency (both less than and 
more than types) distribution and (2) represent them graphically on the 
same graph paper. Hence find the (3) median, (4) quartile deviation. 

(c) With the modal and the median values as obtained in (a) 
and (b), use an appropriate empirical formula to find the arithmetic 
mean of this distribution. 

(d) If students obtaining marks below 40 are eliminated from 
the frequency distribution, what will be the revised mean ? Calculate 
the mean of the two rejected classes only and use the result obtained 
in (c). [ 1.0,W.A. June 1976 ] ( Ans. 47°5 Kg. ; 47°3 Kg. ; 48 Kg. ; 

47°2 Kg. 3 49'1 Kg. 


29. Explain what is meant by central tendency of data. What 
are the common measures of central tendency ? [1.C.W.A. June 76] 


30. Point out the merits and demerits of the mean, the median 


and the mode as measures of central tendency of numerical data. 
[ 1.C.W.A. Dee. ’76 ] 


81. Form an ordinary frequency table from the following cumula- 
tive distribution of marks obtained by 22 students and calculate (i) 
A.M,, (ii) Median and (iii) Mode. 


Marks No. of studenis 
Below 10 3 
“ , 20 8 
5 30 17 
iF 40 - 20 
50 22 


[ L.C.W.A. June 77] ( Ans. (i) 23°18 marks, 
(ii) 23°33, marks, (iii), 24 marks ) 


82. (a) Given the following frequency distribution, calculate the 
mean : ‘ 


Monthly wages (in Bs.) No. of workers 
12°5—17'5 2 
17°5—22°5 22 
92°5—27°5 10 
275—32'5 14. 
32°5—37'5 i 3 
37°5—42'5 4 
42°5—47°5 6 
475—52°5 1 
52°5—57'5 1 

Total 63 


Bus. Stat.—12 
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(0) Draw cumulative frequency diagram (less than type) x 
the above frequency distribution and hence determine the medi 
wages. [1.C.W.A. Dec. ’77] ( Ans. Re. 28°25 ; Rs. 25°75) 


33. The following table gives the Vickers Hardness numbers) of: 
20 shell cases : : 


663 613 627 604  60'2 
645° 665 629 G1 «G78 
650. 627 622 648 658 
622 675 675° 609 63'8 


Draw the cumulative frequency diagram of these numbers (Hither less 
than type or more than type need be drawn). 


Determine the range, upper and lower quartiles, inter-quartile: 


range and median. Indicate the quartiles and the median on the 
cumulative frequency diagram. 


34, The expenditure of 1000 families is given as under : 


Expenditure: | 40—59 60—79 | 80—99 | 100—119 | 120—1 
in Rs.) 
No. of families : 50 2 500 2 50 


The median and mean for the distribution are both Rs. 87°6 ) 
paise respectively, Calculate the missing frequencies. 


[I.0.W.A. June 78 ] ( Ans. 250 ; 150) 


85. An aeroplane flies around a Square the sides of which measure 
100 Kms. each. The aeroplane covers at a speed of 100 Kms. per hour 
the first side, at 200 Kms. per hour the second Side, at 300 Kms. p 
hour the third side and at 400 Kms. per hour the fourth side. Use 
the correct mean to find the average speed round the square. 


[I.0.W.A. June 78 ] ( Ans. 192 Kms. per hour ) 


36. (a) Explain what is mean by central tendency of data. What 
are the common measures of central tendency ? 


(b) Given below the frequency distribution of carbon content — 
(present) in 150 determinations on a certain mixed powder. 
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Present Carbon Frequency 
40—41 sf 
42—4'3 2 
4°4—4'5 7 
46—4°7 20 
48—4°9 25 
50—5'1 30 
5'2—5'3 10 
5'4—5'5 25 
5'6—5'7 30 


—Compute the arithmetic mean, median. 
[ I.0.W.A. Dee. '78] ( Ans, 5°118 '; 5°088 ) 


. 87. Below is given the frequency distribution of marks in 
Mathematics obtained by 100 students in a class : 


Marks 


20—29 
30—39 
40—49 
50—59 
60—69 
70—79 
80—89 
90—99 


Total 


No. of students 


8 
10 
25 
81 
il 
12 


100 


Draw the ogive for this distribution and use it to determine the 


median. 


[ 1.0.W.A. June 79 ] ( Ans. 51°8 marks ) 


88. (a) In certain country, the age-distribution of women in 
1947 is as follows : 


20 
30 
40 
50 
60 


Years of age 

under 10 

10 and under 20 
oon 380 
woo” 40 

» » 50 

n » 60 
Sa 70 
no» 80 


70 


Millions 
3°75 
3°30 
3°65 
3°95 
3°65, 
3°15 
2°45 
2°10 - 
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Calculate the (i) A.M., (ii) Mode, (iii) Median, and (iv) First 
Quartile. 
(b) Prove that for any two real quantities AM. > G.M.>H.M. 
[ 1.0,W.A. June 79 ] ( Ans. (i) 86°61 yrs., (ii) 35 yrs., 
: (iii) 35°82 yrs., (iv) 18°33 yrs. ) 
89. Calculate the mean and the median from the following data : 


Weekly wages (Rs.) Number of workers 
Below .10, 8 
TANNA 18 
 aiauemcaint 45 
” 40 90 
n 50 ‘ 113 
>» 60, 120 


[C. U. B. Com: (Hons.) 1980] ( Ans. Rs. 32°17 ; Rs. 33°33 ) 


40, The frequency distribution of weekly wages in a certain 
factory is as follows : 


Weekly wages (Rs.) Number of workers 
2327 2 
98—32 6 
33—87 9 
38—49 14 
43—47 32 
48—59 16 
58—57 12 
58—62, . 6 
63—67 p) 
68—72 | 


Total 100 
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Draw the ogive (less than or more than type) of this distribution 
and find from the ogive (i) the first quartile, (ii) the median, and 
(iii) the third quartile. 

[ L.C.W.A. Dec. 1979] (Ans. Rs. 40; Re. 42; Rs. 51) 


41, Find the median and mode for the following distribution : 


No. of days absent No. of students 
Less than 5 30 
noon 10 225 
” ” 15 465 
” ” 20 580 
” ” 25 634 
” ” 80 644 
” ” 35, 650 
>» 40 653 
” » 45 655 


[ 1.0.W.A. Dec. 1979] (Ans. 19°14 days ; 11°32 days) 


DISPERSION 


Introduction. 


The various measures of central tendency give us one single 
figure -to represent the entire data. But the average, as we have 
seen, has its own limitations. There are number of series whose 
averages may be identical, but differ from each other in many ways. 
In such cases further statistical analysis of the data is necessary to 
study these differences. Measures of dispersion help us to study 
the characteristics, i.¢., the extent to which the items (or observations) 
differ from one another and from central value. 


Suppose there are three series of 5 items, each as follows : 


CO O 
A: 50 50 50 50 50, total=250, mean=50 
B: 48 45 52 50 55, total=250, mean=60 
C: 2 11040 80 68, total=950, mean=50 

ES acl ere ed eS tides OI haha eh 


In A, the values of all the items are same and not deviated (or 
Scattered) from the mean. There is no dispersion. 


In B, only one item is perfectly represented by mean, the other, 
items are not very much scattered as the minimum value is 48 and 
the maximum is 55. 


In O, not single figure is represented by mean and the items 
vary widely. The dispersion is very much, in comparison with B. 
Obviously, the average does not satisfactorily represent the items. 


For correct analysis of these series, we are to study something 
more than their averages. From above, it is clear that a study 
regarding the deviations about an average should be accounted for. 
This kind of deviation is known as dispersion. 


A measure of dispersion is designed to state the extent to which 
individual observations (or items) vary from their average. Here 
we shall account only the amount of variation (or its degree) and 
not the direction (which will be discussed later on in connection 
with skewness). 


a 
3 
id 
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Usually, the deviations of the observations from their average 
(mean, median or mode) are found out, then the average of these 
deviations is taken to represent the dispersion of a series. This is 
why measures of dispersion are known as Averages of the second order. 
We have seen earlier, mean, median and mode, etc. are all Averages 
of the first order. 


Types. 


Measures of dispersion are mainly of two types : 
(A) Absolute measures, (B) Relative measures. 


(A) Absolute measures are of four types : 
(i) Range 
(ii) Quartile deviation (or Semi-interquartile range) 
(iii) Mean deviation (or Average deviation) 
(iv) Standard deviation 


(B) Among the Relative measures we find the following types : 
(i) Coefficient of quartile deviation 
(ii) Coefficient of dispersion 
(iii) Coefficient of variation 
Absolute and Relative measures: If we calculate dispersion of a series, say, 
marks obtained by students in absolute figures, then dispersion will be also in the 


same unit (i.¢., marks). This is Absolute dispersion. If again, dispersion is calculated 
as a ratio (or percentage) of the average, then it is Relative dispersion, 


Range. : : 
Por a‘set' of observations, range is the difference between» the 
extremes, i.¢., 
Range = Maximum Value— Minimum Value, 


Example. 

The marks obtained by 6 students were 24, 12, 16, 11, 40, 42. 
Find the Range. If the highest mark is omitted; find the percentage 
change in range. 

Here maximum mark =42, minimum mark=11. 

Range =42-—11=31 marks. 

If again, the highest mark 42 is omitted, then amongst the 
remainings, maximum mark is 40. 

So range (revised) =40-11=29 marks. 

Change in range =31—29=2 marks. 
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-". reqd. percentage change = 3 100 = 33°33%. 


Note. Range and other absolute measures of dispersion are to be expressed in 
the'same unit in which observations are expressed. 


Advantages and Disadvantages of Range. 

Advantages: Range is easy to understand and is simple to 
compute. 

Disadvantages: It is very much effected by the extreme 
values. It does not depend on all the observations, but only on the 
extreme values. Range: cannot be computed in ‘case of open-end 
distribution. 


Uses of Range. ras 
It is popularly used in the field of quality control. In stock- 
market fluctuations, range is used. : 


Quartile Deviation (Q.D.). 
The Quartile Deviation is half of the difference between the 
upper and lower quartiles. I 
“. Quartile Deviation =4(Qs—Q,). 


By. Inter-quartile range, we understand the difference between 
two quartiles (7.e., Qs—Q,), and half of this means Semi-interquartile 
range (semi stands for half). 


Since 50% of the observations lie between two quartiles, as such 
Inter-quartile range gives a fair measure of variability. Interquartile 
range also doesnot depend on all observations, and it is effected by 
fluctuations. 


Quartile Deviations (Q.D.) is an absolute measure of dispersion. 
If it is divided by average value of two quartiles, we will find 
Cooficient of Quartile Deviation (a relative measure of dispersion). 


; : : ta 23Qs-¢ Py) Qs- 1 
Symbolically, coefficient of quartile deviations Qs 2 


For INDIVIDUAL OBSERVATIONS : 


Example. Find the quartile deviation and coefficient of quartile 
deviation of the following observations : 


(Marks) 11, 12, 14, 17, 19, 21, 27, 28, 80, 32, 33. 
Here, n=11, and observations are arranged in order. 


DISPERSION 185 


Qi =size of ehh item =size of 8rd item=14 marks, 


Qs =size of —— Sint Den j item = size of 9th item=30 marks. 
Quartile Deviation (Q. D.)= - oe -% =8 marks, 


* ‘ A Aiiorsa Tears © aM de 
Again, Coefficient of quartile deviation 3041444 363. 


For DIsORETE SERIES: 
Example. Compute coefficient of quartile deviation from the 
ty following data : 


SPIES SSI SS RRS APE PLY 
Wages (Rs.): 12 14 17 21 27 30 36 Total 
No. of workers: 4 6 8 7 12 10 4 51 


Cumulative Frequency Distribution 


| Wages (Rs.) |  f are 
| 12 ric 4 
| 14 6 10 
17 8 18 
a1 7 25 
27 12 37 
30 10 | 47) 
“36 4 | 61(=N) 
Qi =size of “+ 7th item=size of 18th item=Rs. 17, 
Ortuie "Seat ET sey +1), th item =size of he 2 th item 
=size of on item = Bs. 30. 
=wai7_18 


Here Q. D.= =Bs. 65. 


186 BUSINESS STATISTICS 


‘ 80-17_13_. 
Coefficient of Q. D. "30417 747 277. 
For Continuous SERIES : 


Example. Calculate appropriate measure of dispersion from the 
following data : 


Wages in rupees per week No. of wage-earners 
less than 35 14 
3856—37 62 
88—40 99 
41—43 18 
over 43 7 


¢ [1.0.W.A. Jan. 1964] 


_ In the frequency distribution, there are open-end classes, so Q. D. 
would be the appropriate measure of dispersion. 


Cumulative Frequency Distribution 


Wages (Rs.) Cum. freq. 

less than 35 14 
85—37 76 
38— 40 175 
41—43 193 

over 43 200(=N) 


Qu =sizo of th itom=size of 60th item. 


Q, lies in the class (34°5—37'5). 
= garg. (87'5—84'5) Pek 
Qu 84°54 EE? (60-14) = 34'5 ye 
=34°5+1°74=Rs. 36°94. 
“Qs =size of 150th item, Qs lies in the class (37°5—40'5) 
2 OB 6 = eon ays) eee 
“- Q3=375+ 99 (150-76) = 87°5 + 55 x 74 
=375+2'24—Rsg. 39°74, 


x 36 
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=Rs. 1°75. 


Note. Since mid-values of open-end classes cannot be determined, mean 
deviation and standard deviation cannot be calculated. 


oe Quartile Deviation = So74— 36-04 = 550 


Advantages and Disadvantages of Quartile Deviation. 


Advantages: It is superior to range a8 measures of dispersion. 
In case of open-end distributions, it can be computed. It is not 
effected by the presence of extreme values. 
; Disadvantages : -Quartile deviation is neither based on all the 
observations nor is it capable of further algebraic treatment. Its 
value is much effected by sampling fluctuations. It is not a measure 
of dispersion, particularly for series in which variation is considerable. 


Mean Deviation ( or Average Deviation). 


The two methods—Renge and Quartile Deviation are calculated, 
based on only two points of a series—extreme values in case of range 
and quartiles for quartile deviation. They are not based on all the 
observations. Mean deviation and standard deviation, however, are 
computed by taking into account all the observations of the series. 


Definition. 
Mean deviation of a series is the' arithmetic average of the 
deviations of various items from the median or mean of that series. 
Median is preferred since the sum of the diviations from the 
median is less than that from the mean. So the values of mean 
deviation calculated from median is usually less than that calculated 
from mean. Mode isnot considered, as its value is indeterminate. 


Mean deviation is known as First Moment of dispersion. 


Computations of Mean Deviation. 


For INDIVIDUAL OBSERVATION: The formula is as follows : 
=IDI 


Mean Deviation (M.D) = 
where |D| within two vertical lines denotes deviations from mean (or 
median), ignoring algebraic signs (i.e., +and -). 


Steps to Find M.D.: 
(1) Find mean or median ; 
(2) Take deviations ignoring + signs ; 
(8) Get total of deviations ; S 
(4) Divide the above total by the number of items. 
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Example. 


To find the mean deviation of the following data about mean and 
median (Rs.) 2, 6, 11, 14, 16, 19, 28. 
Computation of Mean Deviation 


About Mean About Median 


Serial % | Dev. from A.M. | Serial a | Dev. form Med. 
no. (Rs.) ignoring + signs} no. (Rs.) ignoring + signs 
Ip| ID) 

1 2 11 1 2 12 

2 6 7 2 6 8 

3 11 2 3 1L 3 

4 14 ay 4 14 0 

5 16 3 5 16 {2 

6 | 19 6 6 | 19 5, 
A a hb? athe acca 
Total — 40 Total _— 39 


A. M.=}(Q4+6411+14+16+194+93)=4%91=Re. 13, 
Median =size of Thy item size of 4th item <Rs, 14, 
DID! . 40 


Mean deviation (about mean)= qT Be. OTL. ! 
Mean deviation (about median) = 21D = 2 =Rs. 5°57. 


Note, The sum of deviation (Z|D]) about median is 89, less than |D| about 
mean (= 40), Also M.D. about median (i.e.; 5°37) is lessthan that about mean (i.6., 5°71). 
Coefficient of Mean Deviation. 


ee 5 MOD thns 
About mean, Coefficient of M. D. ome Age (app.) 


About median, Coefficient of M, D,- MD. _557_. 
ub median, Coefficient of M. D. medians 14 720 (app.) 
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For DISORETE SERIES: 
The formula for computing M. D, is 


_2f|DI, 
M.D.= SF 


where |D| =deviations-from-mean (or median) ignoring + signs. 


About Mean. 
Example: To calculate mean deviation of the following series : 


Total 
40 


20 25 
11 8 


(Marks) #: 5 10 
(Student) f: 6 7 8 


—Find also the coefficient of dispersion. 


Computation of Mean Diviation (About Mean). 


Marks Dev. from |' Step | Deviation 
assumed | deviation from actual 
mean (15) mean (16) 
© bf ad adds fd ID] AID| 
(1) |.) (3) (4) |@=(2)*(4 | (6) |(=@)x*(6) 
5 16 -10 -2 -12 11 66 
10.| 7 ih -1 “sen b 6 49 
15 | 8 0 0 0 1 8 
20 |11 5 1 ll 4 44 
25 | 8 10 2 16 - 9 72 
Total | 40 _ — 8 — 232 
A.M ag, <A + AE = 15+ 9 * 8 =15+1=16 marks. 
M.D. = 2p! - 282 5'8 marks. 


Coefficient of dispersion (about mean) =—— 


5'8 
1 


an 16 = "363. 
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About Median. 
Example. The same example as given above. 


Computation of Mean Deviation (About Median) 


Marks Cum. Dey. from’ 
freq. median 
(15) 


IDI 


Median = size of the ie 1 item =size of 20°5th item =15 marks 


M, p.=2/.DI _ 280 


Sf i977 5°75 marks. ny 
Pies . an)e M.D. _5°75_. 
‘Coefficient of dispersion (about median) median 15 7 888! 


For ContTINvoUs SERIES : 


Calculations is similar to the above process. The only difference 
“is that we are to take the deviations (in case of M. D. about median) 


from the middle points of the various class-intervals, 


About Mean. 


Example. Measurements of the lengths in feet of 50 iron 


rods are distributed as follows : 
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Class-boundary Frequency 
2°35—2'45 1 
2°45—2'55 4 
2°55—2°65 7 
2°65—2°75 15 
2°75—2'85 11 
2°85—2'95 10 
2°95—3°05 2 


Find, to two decimal places, the value of the mean deviation. 
[C. U. M. Com. 1965 J 


Computation of Mean Deviation (About Mean) 


——————$— 
Olass- Mid- | Dev. from Ley, from 
boundary value | ass. mean mean 
2°70 2°738 
(f) f x d d@=d['t fd’ ID] f\|p| 


() (2) | @) (3) (4) | @)=(2)x(4)} ©) | (7) = (2) x (6) 


235—245 1 2°40 —"80 -38 -38 "338 “388 


2°45—2'55 4 


2'55—2°65 vf 


265—275 | 15 


2°T5—2'85 ii 


2°85—2'95 10 


295805 | 2 
Total 50 
aap elt yp 2970419 x (1)=9°70 + 038= 2" 
AM.=A+ Sf xa 270+ 55x (CD) 2°70 + (038 = 2'738 ft. 


M.D. (About Mean) 3 = 7653 —-11304="11 tt, (2 dee. pl.) 
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About Median. 


Example. Calculate mean deviation from median’ and the 
corresponding coefficient of dispersion for the following data : : 
INTIS eS oA RE i TSE al 


Height in inches | No. of saplings Height iniinches | No. of saplings 

4°5 and under 5°5 2 8'5and under 95 95 

65 xh. 6b 6 95.» » 105 20 

i ae ai Aa Bie (Ue ee alls 8) 10 

RE oe 85 18 WS ee 12'S 7 
Take N+ as the rank of median. LO. A. 1968 ] 


Computation of Mean Deviation (About Median) 
ui 


& Mid- | Dev. from 
sy EL k 4 
ey f Cum. freq. | value | median (9) 
\ 
oof @ |p| f lvl 
(1) (2) (3) (4) (5) (6) =(2) x (5) 
£5— 53 2 2 5 4 8 
BiB 65 6 8 6 3 18 
65— 75 | 19 90 7 a 24 
T5— 85 | 18 38 8 1 18 
85— 95 | 95 63 9 0 0 
9'5—10'°5 | 20 83 10 1 20 
10°5—11'5 10 93 11 2 20 
11'5—19°5 7 | 100(=N)] 12 | 38 a1 
Tofal | 100 | ie th ae 199 
oP SD SUT ONES TR B  SS t a elpi ee 
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+1 


Median =size of N+1 1h item <size of 10th item 


=size of 50°5th item. 


So median lies in the class (8°5—9'5). 
Median =8°5+ 2°85 (05 — 38)=8'5+ 5-9 inches. 


Mm. p,-2//B! - is = 1/99 inches. 


=f 
f fe RE SN 
Coefficient of dispersion (about median) abiiaot 16 148. 


Note: In continuous series, js used as rank of median. But here we are 


asked to use Ne 


Computation of Mean Deviation—Short-cut Method. 


When median (or mean) are in fraction, calculation becomes 
difficult. In such a case, following short-cut method is used for 
computation. 


M.D, = zu_ al, where u=items greater than median 
l= » lower ” ” 
Example: To calculate the mean deviation about median in 
the following series of marks : 16, 14, 17, 20, 12, 24, 21, 27, 26, 30. 


Arrangement : 12, 14, 15, 17, 20, 21, 24, 26, 27, 80, n=10 


Median =size of nit th item=size of 5°5th item= 20°5 marks. 


median (20°5) are 21, 24, 26, 27, 30, their total = 128. 


Items greater than 
dian are 12, 14, 15, 17, 20, their total = 78, 


Items less than me 


e 128-78 _ 50_ 
*. M.D. Seon Pa LO 5 marks. 


Advantages and Disadvantages of Mean Deviation. 


Advantages : 
(1) Ib is based on all the observations. Any change in any 
jtem would change the value of mean deviation. 
(2) It is readily understood. It is the average of the deviations 
from a measure of central tendency. 
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(3) Mean deviation is less affected by the extreme items than 
the standard deviation. 


(4) It is simple to understand and easy to compute. 
Disadvantages : 


(1) Mean deviation ignores the algebraic signs of the deviations, 
and as such it is not capable of further algebraic treatment. 


(2) It is not an accurate measure, particularly when it is 


calculated from mode. 
(3) It is not popular as standard deviations. 


Uses of Mean Deviation. 

Because of simplicity in computation, it has drawn the alteration 
of economists and businessmen. It is useful in reports meant for 
public. 

Standard Deviation. 


In calculating mean deviation we ignored the algebraic signs, 
which is mathematically illogical. This drawback is removed in 
calculating standard deviation, usually denoted by ‘o’ (read as sigma). 


Definition : 


Standard deviation is the Square root of the arithmetic average 
of the squares of all the deviations from the mean. In short, it may 
be defined as the root-mean-square deviation from the mean. 


it @ is the mean of ay, Ta, ++-...,@n, then o is defined by 
v [F feat +-+e.-2}]-4/1 (nai — 92 Sa + Sx) 
MeO a / a= 


“Computation of Standard Deviation. 


For Inprvipvar OBSERVATIONS : 


Computation may be done in two ways—(a) by taking deviations 
from actual mean, (6) by taking deviation from assumed mean. 


(a) Steps to follow—(1) Find the actual mean, 
* (2) Find the deviations from the mean, 
(3) Make squares of the deviations and add up, 


(4) Divide the addition by total number of 
items and find Square root. 
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(b) Steps to follow—(1) Find the deviations of the items from an 
assumed mean and denote it by d. Find 
also Sd, 

(2) Square the deviations, find 5d?, 
(8) Apply the following formula to find standard 
deviations, 


8. D. (o)= 4/2 (24) 


Example: The table below shows the marks obtained by 
10 students in a certain test, Calculate the standard deviation by 
both the above methods. 


Roll No.: 1 9. 8 -ehes BO OTT B= 9 10 
Marks: 43 48 65°57 81 60 87 48 78 58 


Computation of Standard Deviation 


Method (a) Method (d) 
Dey. from | Square of Dey. from | Square of 
Roll actual dev, assumed dey. 
No. |Marks} mean Marks| mean 
52°56 (d) | (a*) 50 (a) (a*) 
‘4 43 -95 90°25 43 bald 49 
2 48 -4°5. 20°25 48 -2 4 
8 65 12°65 15625 65 15 225 
4 57 45 20°25 57 7 49 
5 31 -21'5 462°25 31 =19 361 
6 60 75 56°25 60 10 100 
4 37 —15'5 240°25 387 -13 169 
8 48 -4'5 20°25 48 -2 4 
9 78 25°5 650°25 |. 78 98 784 
10 58 55 30°25 58 8 64 
Total | 525 | | 1746:60 | 525 | sd-95 | sa*=1809 
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For Method (a), 


1 get 
A. M.= 79 * 525 52'5 marks. 


8. D. (c)= 6 00 J/17£65=18'91 marks, 


Here the average marks are 52°5, and they deviate on an average 
from the average by 13°21 marks. 


For Method (b), 
2 > 2 2 a 
8. D. (o)=/2#- (24) ~ / 18 (28 = J1809= 695 
= /17465=18'21 marks. 


Note. If the actual mean is in fraction, then it is better to take deviations 
from an assumed mean, for avoiding too much calculations, 


For Disorerz SERIES : 


There are three methods, given below, for computing Standard 
Deviation : 


(a) Actual Mean, (6) Assumed Mean, (c) Step Deviation. 


For (a), the following formula is used : 


This method is used rarely because if the actual mean is in 
fractions, calculations take much time. 


om / Ee. where 2=(X-X). 


For (b), the followings are the steps to be used : 


(i) Find the deviations (from ass. mean), denote it by d, 
(ii) Obtain Sfd, 


(iii) Find d? and then S/d’, 
then use the formula ; 


= ,/2fd* _ i)" 
o= s/ sf V3 
Example. Find the standard deviation of the following series : 


eee ae EI 
2: 10 IL 12 18- 14° ‘Total 
i tease Wb er a 3 48 


[C. A. May 1963 } 


| 
| 
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Calculation of Standard Deviations 


fda 
(1) | (2) (3) (4)=(2) x (8) | (6)=(8) x @) | (6) =(2) x 6) 


10 3 -2 = 6 
il 12 sil = 13) 


£ fd? - (22) te OVS a 
i v ie }-/ 48 -3) ae 
For (c), The following formula is used : 
The idea will be clear from example shown below : 
2 
o= at afd - (22 (a) xi, where d’=step deviation, 
i=common factor. 
Example: 
Find the standard deviation for the following distribution : 
45° 145 24°5 84°65 44°5 545 64°56 
fi: 2 3 5 17 12 7 4 
Ae Cabri ey a Te 
3a (08) \ xin /PE- BY xi 
of Sf ol xi RAE }x 10 
(Computation Table is shown in the next page) 


= J(@29= 1764) x 10 = 14295 x 10= 14'295, 
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Calculation of Standard Deviation 
a a 


2 f a | d=a/10 fa’ fa’? 
45 2 | -30] -8 6 18 
14°5 8 | -20} ~-9 -6 12 
24°65 6 |=-10} ~4 -5 5 
345 | 17 0 0 0 0 
445 | 19 10 1 13 12 
54°5 7 20 2 14 28 
64°5 4 30 8 12 36 
sf=50 | — =a xfd'=21 | sfa'2=111 
ee 


For Conrinvovs Szrizs : 


Any method discussed above (for discrete Series) can be used in 
this case, Of course, step deviation method is convenient to use. From 
~ the following example, procedure of calculation will be clear. 


Example: Find the Standard deviation from the following 
frequency distribution : R 


Weight (lb.): 131—140 141—150 151—160 161-170 171—180 
No. of persons : 2 5 4 9 7 
181—190. 191—900 211—240 
5 3 1 


25 a REE rE in eu ac 


(1.0.W.A. Jan. 1971 ] 


— th eli I eg tO 
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Calculation of Standard Deviation 


Weight Mid-value 

(Ib.) A z d d'=a/5 fa’ fa'* 
131—140 | 2 1355 | -30 | -6 -12 72 
141—150 | 5 145'5 -20 -4 -20 80 
151-160 | 4] 155% -10 SRE Pr 8 16 
161—170 | 9 165°5 0 0 0 0 
171—180 | 7 175°5 10 | p} 14 98 
181—190 | 5 185°5 20 4 20 80 
191—210 | 3 200°5 35 7 21 147 
211—240 | 1 995°5 60 12 12 144 
Total | 36 — =_ - 27 567 


= 
oon BEG xi ERs 

= J(1575- aaah x5= /151874 x5 

=3'897 x 5=19'485 = 19°50 lbs. (app.) (Calculation by log table) 


Note. If we are to find the mean, then 


AM. wash xi 165'5 + 22 x 5 = 1655 +755 


= 165°5 + 3°75 = 16925 Ibs. 


Mathematical Properties of Standard Deviation. 


(1) CompineD STANDARD DEVIATION : 


We can also calculate the combined standard deviation for two or 
more groups, similar to mean of composite group. The required 
formula is as follows : 


‘ miggf its tonic” +nyd1*+nedo*, 
ae ni tne 
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Where, o12=combined standard deviation of two groups, 
o, =standard deviation of 1st group, 
og =standard deviation of 2nd group, 
d, oy —ie 3 de =e 12 
For Three Groups, 


C193= {hice tage tnece® tmda* tmada? + nade?) 
Nitnetns 
where, d,=2%1~@199; dy=@s—Zia9; ds =@g—Zi0s. 


Example : Two samples of sizes 40 and 50 respectively have the 
same mean 53, but different standard deviations 19 and 8 respectively. 
Find the standard deviations of the combined sample of size 90. 


re [C. A. Nov. 1963 ] 
Here, 1 =40, 71=58, o1=19 


a= 50, Za=53, 09 =8. 
Mats + neha _ 40 x58 +50% 53 


Now, 210 


Ni tne 40 +50 
= 2120+ 9650 _ 4770 _ 5 
90 905772: 


Now, di =a —212=53-53=0, dg=0, 
= ,/{40(19)" + 50(8)* + 40(0)* + 50(0)? 
sei v { 40 +60 } 


= 4/ (+8000) = [Wiis foe, 


Example: The number of workers employed, the Mean Wages 
(in Rs.) per month and the Standard Deviation (in Rs.) in each 
section of a factory are given below. Calculate the Mean Wages and 
Standard Deviation of all the workers taken together. 


Section No. of workers Mean wages Standard deviation 
employed (in Rs.) (in Rs.) : 
A 50 118 6 
B 60 120 7 
oC 90 116 8 


7 [1.C.W.A. Jan. 1964 ] 
Fog = 1% + Meat nets _50X 113 +60 x120+90x 115 
ee one th, 50 + 60 +90 
5650 +7200 +10350 _ 98900 
200 200 Rs. 116. 
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Now, d:=81-@19s118-116= —3, 


dg=s-@195=120-116=4, 
ds =@s —@195 = 115-116= —1. 


Bins »/ (oe + 60(7)* + 90(8)2 + 50(— 3)? + 60(4)* + 90( — a 


60 +60+90 


hi A, +2940 + Bias 450 + 960+ a0 a Reif ion 


= J60=Rs. 7°75. 
(2) Prove that the Standard Deviation does not depend on the 
choice of origin. 


4 For the n observations 21, ©, -..» tn let di, da, «++ dn are respec- 
tive quantities obtained by shifting the origin to any arbitrary constant, 
say A, so that dj=ai—A (for i=1, 2, «...). Now weare to show oz = ca. 


We know, oz? = 3(ai-2)*/n, where = Dai/n 
Now, d=a;—A so that Sdy=Sa;- DA (taking ¥ to both sides). 


Again a = aes - =A (dividing by ») 


or, d=z-A or, w=Atd 
Now, m-z=(At+d)-(Atd)=dird 
So og? = 3(ay—2)?/n = Bd: 0)? /n= oa" 


On = Cds 


(3) Prove that the Standard Deviation caleulated from two values 
2, and a, of a variable # is equal to half their difference. 
(1.0. W.A. Jan. 1968 ] 


pas -7)2 
We know grea lla a), according to definition of o, 


and where @=4(ai+@) ie, Gis A.M. of e and ws 
weil ge) lege 
(putting the value of @ ) 
“a [(g7)'+(57)'] 


=4(t{(e1- ee)? +(e, -a2)*$)=2(e1 - @a)* 
o=4(a,—aa), since o is always positive. 
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(4) Show that if z is the Arithmetic Mean of the quantities 
n Wu n hy 
£1, %2,..., am then 2 (j-z)? = 2a ne. 
i=) i= 


[ C. U. M. Com. 1968 ; 1.C.W.A. Jan. 1969 } 
£ PAY pe 2 Pap ae | 2 2 pat n tne 
3(e1- 2) = Zar -QWwiet a )= Ba -%% Sat de 
n _-_ =— — _ =. 
= Sai? —2n net na* [2=Sailn or, Sa= nz] 
% = _ 
= Sai? — Qng* +z? 
” - 
= Xai? —nz*. 


; (5) Prove that the standard deviation is independent of any 
change of origin, but is dependent on the change of scale. 
(1.C. W. A. Jan. 1971; Dec. 78} 


WITHOUT FREQUENCY : 
For the n observations 2,, ®a,-.-, fn, let the origin be changed 
to A and the scale tod, then y,=*#—4 


V1, Va,--., Yn are the deviations of 2, ®o,+..,%m from an arbitrary 
constant A, in units of another constant’ d, 


Now, w=A+dy i.e, mean of 2's=A +d (mean of y's) 
Again, a%-@=(A +dy)—(A+dy)=d (y;-5) 
ont Bai =a)? 4 Zidlus =p -P3(n- 7)? =d%o,? 
n n 


n 


or, %=A+dy,, which means 


viele, (A is absent, but d is present), 


This shows S. D, is uneffected by any change of origin, but 
depends on scale. 


With Frequency : 
Si) Pe eee 
8. D. (oc) = if Seat, w=actual A.M. (weighted mean) of 


variates @. Changing the origin to A(say), let w=a2—A or, c=utA 
ie, Z=UtA, 


Now, ox= Deo ee NEN = pit a 


ve Oe =n i658. D, is uneffected by change of origin. 
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Now for change of scale, let unt, d=width of class 
or, ud=a2-A 
or, c=udt+A ie, Z=udtA. 
Substituting the value in oz we find, 
_. /Sf@=a)' _  /SfudtA-ud-Al_  /Sflud—uad)® 
ie v af v of v ay 
= , /3flu-u)*a* Sflu-u)* _ 
pel 3 ax aft merae doy 


i.¢., 8. D. is affected by change of scale. 


(6) Find the A.M. and Standard Deviation of the first natural 
numbers. 


Ty Donticordes , » are the first » natural numbers. 
WLtBt8te tn ~ Met) 1)/2 (for the sum formula, 
Rom AM. n see A.P. chapter ) 
—Ant+1)_ ntl 1s i 
on a (1) 
Again we know, 14 ateat ee tnt = nt Gn +1) set (2) 


(see A. P. chapter ) 


eae et 
Now, o= Ge - (22)"} (sce the formula given in def. of 8. D.) 
n n 
: Sa* _ (Ea\* _ nln +1(2n+1)_ (n+1\* 
Fe ee be cory maar Or a RR 
= 24 DGn+D fot U an tin += Sint i 
4 12 
“(os 1Kins3=99~ =8n-8)_ (nt+1)(n-1) nn? -1 
12 ape 12 
ayy Late 
o (s. d.) 19 
Example: Find A.M. and 8. D. of the natural numbers 1 to 11. 


MET ag n=11)=4=6 


A. M.= 


§.D. (2)= Eee =1. Ce eed fue /10=-3'1638. 
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Charlier’s Check of Accuracy. 
Checking accuracy of computation can also be applied in case of 
standard deviation, by applying the following equation : 
sid’ +1)*]= = 3(fd!*) + 25(fa!)+ Bf 


Example: Apply Charlier’s Check of Accuracy and hence find 
' Standard Deviation and Arithmetic Mean. 


Wages (Rs.):  0—10 10—20 20—30 30—40 40—50 
No. of Workers: 2. 6 20 14 8 


Wages (Rs.) 


0—10 
10—20 
20—30 
30—40 
40—50 


Total 


Now, x[f(d’+1)*]=150 
Again, =(fd'*) + 2=(ya') + sf=-60+9x 20+ 50 =60+40+50=160. 
Hence the calculations are correct. 


Now, o= =e ~ (22) «3 =f {= (0 - (2 sx 10 
=5'099 x rot. 50°99 


Ant A+ 202 5295425 19-0544 Re, 29, 


Variance. 


The square of the Standard Deviation is known as Variance. 
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‘Coefficient of Variation. 


It is the ratio of the Standard Deviation to the Mean expressed 
as percentage. This relative measure was first suggested by Professor 
Karl Pearson. According to him, coefficient of variation is the percen- 
tage variation in the Mean, while Standard Deviation is the total 
variation in the Mean. 


Symbolically, Coefficient of variation (V)= = x 100 = Coefficient 


of standard deviation x 100. 


Note. The coefficient of variation is also known as coefficient at variability. 


Example : 

If Mean and Standard Deviations of a series are merrier 40 
and 10, then the coefficient of variations would be 15x 100= 25%, 
which means the Standard Deviation is 25% of the iba 


Note. For comparing variability of two or more series, it is used commonly. 
A series of having coefficient of variation greater, is said to be more variable, 4.¢., less 
uniform, less stable or less consistent. Again a series, having coefficient of variation 
lesser is said to be less variable, é.¢., more uniform, more stable or more consistent. 


Example: 

From the marks given below obtained by two students taking the 
same course, find out who i is more intelligent and who is more consis- 
tent student. . 


A: 68 59 60 65 66 52 75 381 46 48 
B: 66 87 89 46 93 65 44 54 78 68 


In order to find out the more consistent between the two students, 
we are to compute the mean marks and then the coefficient of variations 


for comparison. 


For Student A: a= eae = =56 marks. 


2 ph — ties 
Noes wr ad _ /i 1376 /ig7§=11'73 marks. 


(Computation Table is shown in the next page), 
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Computation of Mean and Standard Deviation 


For Student A For Student B 
a? 
144 
361 
441 
484 
625 
9 
576 
196 
100 
0 
560 | 1376 680 | 9936 
: } é 11°73 : 
Again V (coefficient of variation) = 36 * L00=20'94%. 
For Student By B= 32 = 580. 69 arts, 
hes vita ay 0 
Now, -/2# € are = 2936 =17'14 marks, 
Again V= 1714 . 109-956-015, 


Average marks obtained by Student B are higher than that of 


Student A, so Student B is more intelligent. Again since coefficient 


of 


variation of Student A is less than that of Student B, so Student A 


is more consistent, 


Example: 


Suppose that 
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samples of polythene bags from two 


manufactures, A and B are tested by a prospective buyer for bursting 
pressure, with the following results : 


Bursting pressure (16) 


50— 99 
10°0—14'9 
15°0—19°9 
20°0—24°9 
25'0—29'9 
30'0—34'9 


Numbers of bags 
A B 
2 9 
9 11 
29 18 
54 82 
ll Q7 
5 13 
110 110 


Which set of bags has the highest average bursting pressure ? | 
Which has more uniform pressure? If prices are the same, which 
manufacturer’s bags would be preferred by the buyer ? Why ? 


[ B. Com. Delhi 1966 ] 


Computation of Mean and Standard Deviation 


Bursting 
Pressure 


(16) 


4°95— 9°95 


9'95—14'95 
14°95—19 95 
19°95—24'95 
24°95—29°95 
29°95—34'95 


Total 


For bag A For bag B 
f fa’ fa* | f fa’ fa 
745 Deere OQ 1B 
12°45 Lak Ata Pf eat aac 
17°45 B90) ONIS Hs Oe 
22°45 54.54) 54 | 32 B28 
27°45 ll 22 44/27 54 108 


32°45 
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For bag A: Mean=A+ pag ea 17°45 +3°55=21 Ib. 


110 
>it - - (2) 160/78 PN RAS Sidi, 5 
Raves * <5 = a/(1'455 — °608) x 
= /'952 x 5="976 x 5=4°880 lb. 
6 _ 4880 ee 
won” $50 x 100 = 93°24%. 
Pete = 17745 +4°36=21'81 Ib. 
ne 804 = JPW Teo i= . JODO =1- 
° af (2) 4 ot) f x5 VG TOE TEA) x 5 = a/2100R x 5 = 1'417 x5 
=7°086 Ib, 
085 
v= L085 x 100 32'48%, 


The bags of Manufacturer B have the highest bursting pressure, 
which is clear from the averages calculated above. Again the bags of 
Manufacturer A haye more uniform pressure, since the coefficient of 
variation is less than that of Manufacturer B, If again, the prices 
are same, the bags of Manufacturer A should be preferred by the 
buyer because they haye more uniform pressure. 


Example: An analysis of the monthly wages paid to workers in two 
firms, A and B, heleeainet tothe same industry gives the following 
results. 


Firm A Firm B 
No. of wage-earners 586 648 
Average monthly wages Rs. 52°5 Rs. 47°5 
Variance of the distribution of wages 100 121 


(a) Which firm A or B pays out the largest amount as monthly 
wages ? 

(0) Which firm A or B has greater variability in individual 
wages ? 

(c) Find the average monthly wages and the standard deviation 
of the wages of all the workers in two firms A and B taken together. 


[ 0. U. B.A. (Beon.) 1970 ; Madras B, Com. 1962 ; 1.0.W.A. Jan. 1965 ] 
(a) For firm A: Total wages =586 x 52°5= Rs. 30,765, 
For firm B: Total wages=648 x 47°5=Rs. 30,780, 
*, Firm B pays largest amount. 


() For firm A: o?=100. .. c=Rs. 10. 
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x 100-20 


Now, V= ‘teen 525 


nog * 100= 19°04, 


For Firm B: V= js x 100= 23°16 (here o=Rs. 11). 


Firm B has greater variability, as its coefficient of variation 
is greater than that of Firm A. 
(c) Here, n1=586, #1 =52°5, o1=10 
a= 648, Z2=475, og=11 
NiBy + naty _ 586 x 59'5 + 648 x 47'5 = 30,765 + 80,780 


i aE 586 + 648 1284 
61,545 
~ Tage 74987 =Re. 499. 


Again, dy=@,—%12762'5-49'9=9'6, 
dg =47'5- 49°9 = -—9'4 
Seino: a (aaea" + nah" mada) +nsda* } 
Ny+Ng 


48 jam + 648(11)* + 586(2°6)* + 648(— aa 
586 + 648 
— , /{68600 + 78408 + 3962 +38733\ _ /144708 _ , ,, 
v { 1984 } iio 71088 


(Caleulation by log table) 


Advantages and Disadvantages of Standard Deviation. 


Advantages : 

(1) Standard deviation is based on all the observations and is 
rigidly defined. 

(2) It is amenable to algebraic treatment and possesses many 
mathematical properties. 

(8) It is less affected by fluctuations of sampling than most 
other measures of dispersion. 

(4) For comparing variability of two or more series, coefficient 
of yariation is considered as most appropriate and this is 
based on standard deviations and mean. 


Disadvantages : 
(1) It is not easy to understand and to calculate. 


(2) It gives more weight to the extremes and less to the items 
nearer to mean, since the squares of the deviations of bigger 
sizes would be proportionately greater than that which are 


Bus. Stat.—14 
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comparatively small. The deviations 2.and 6 are in the ratio ~ 
of 1:3, but their squares 4 and 36 would be in the ratio 
of 1:9. 


Uses of Standard Deviation. 


It is the best measure of dispersion, and should be used wher- 
ever possible. 


More Examples. 


(1) From a certain frequency distribution consisting of 18 
observations the mean and the standard deviation were found to be 
7 and 4 respectively. But on comparing with’the original data it was 
found that a figure 12 was miscopied as 91 in the calculations. Calcu- 


late the correct mean and standard deviation. [I. 0. W. A. 1965] 
We know, Se=ng=18 xX 7=196, actual Sx=196-21+19=117. 
22 _117_.. 
actual Mean n 18 65. 


Again onl 2 = (22): 


or, 4= / 22-72 


1g = 22° 


or, 187 49 


Si nisi 
or, tg 164+49=65 


or, 3@*=65x18=1170, 
Now actual Sar*=1170 —(21)* +(19)* =1170 — 441 4144-873. 


BTR Asay. 0 alban 
a aie i: ~(6'5)" = 4850 49°95 = ./6°95 = 9°. 


(2) The Mean and 8. D. 


of ; d 
to be 30 and 3 respectively. Ee ete (ep eereanions vere foun 


After the calculations were made it was 
found that two of the observations Were incorrect, which were recorded 
8 29 and 81. Find the mean and §. D. if the incorrect observations 
are excluded. [C. U. B. Com. (Hons.) 1968 ] 

We know, Sa=ng=95 x 30= 750, 


actual Sa =750-(29+ 31) =690. 
So actual Mean = an = ~ =30 (bere, n= 95 -2= 93), 
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Anim, gets se 
n n 
2 
or, 9A —(80)* 


. 
or, 35 79+ 900=909 
or, Sw? =95 x 909 = 22795. 
So actual Sa? = 29795 — (29)* — (31)? 
= 929795 —841 — 961 = 20998. 
Now, actual o= yes “ _900= /R097=90 = VITH= S11. 


(3) The Mean and the Variance calculated from ® group of 80 
obvervations are 63°2 and-25'93 respectively. If 60 of these observa- 
tions have mean=64'8 and §.D.=4, find the Mean and 8. D. of the 
remaining 20 observations. [1. 0. W. A. July 1971 ] 


We assume that the total, i.¢., 80 obseryations have been split up 
into two groups— n 


Group A contains 60 (=1), observations with Mean 71 =64'8 and 
8. D. o1 =4 and 


Group B contains 20 (=ng) observations with mean @2 and 
8. D. og. j 
Now, for Group A: 1=60, 71-648, 01 =4 
for Group B: ng=20, @2=? ,o2=? 
for combined Group ; 71 +o = 80; #12 =63'2, o127 = 25°93, 


sre Nior +nete 
Now, from ®i2= apie we find, 
60x 64'S +.20zs 
63°9="— ay 
or, 20z2=63°2 x 80 — 60 x 648 
or, 20%, = 5056 — 3888 = 1168. 
: Bo = 2382=58'4. 
Now, di=a@1—#1. = 64'8- 632-16, 
da=@o—@12=58'4 — 63°2—= —48. : 
/ (eu + 20(oa)* + 60(1'6)* + 20(— ay 
60 +20 


Again o12= 


i 4 {260 +20c0" +1536 + 808) _ [15744 + 20007 
80 80. 
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a _ 15744 + 20002 


or, Gig 80 
+99 _ 1574'4 + 2009* 
or, 25°93 ar ae 


or, 20c9*=25'93 x 80 - 15744 =9074'4= 1574'4 = 500 


arjt soghe ans. SYS, 


20 


(4) For two groups of observations, the following results are 
available ; 


Group I Group II 
3(@-5) = 8 X(a@-8) =11 
3(@—5)? = 43 =(z7-8)* =76 
se ni =18 %o=17 


Find (correct to 3 significant figures) the mean and the standard 
deviation of the 85 observations ‘obtained by combinning the two 
groups. {I. 0. W. A. July, 1972 ] 


For Group I: Mean (#)=5+ 3 =5+017=5'17, 
_ /43_(3\3 
8D. (0,)=,/43 (is) j 
= /2'389— "0289 = ./9°3601 = 1537 =1'54. 
For group II: Mean @s)=8- 5 =8>65=7'36. 


a 2 2S. 2 
8.D. (c2)= n-(e) = ./£4710—"4295 
= /4£0485 = 2'013 = 9°01, 


5, , =< 18(5°17)+17(7'35) 91801 _. 
Now, Tig 18+17 = 35 = 6'23. 


Again d,=5'17-6'93 = - 1:06; 
da =7'35 - 6°93 =1°19 
o1a= a/ (ea +17(2'01)? + 18(— 1°06)? + 27013)" 
18+17 
x ) {ase + 68°68 + 90°93 + ate 
35 


Fe ee V/£37=2'09. 
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Lorenz Curve. 


Dispersion can also be studied by the help of Lorenz Curve, after 
the name of Max. O. Lorenz, the economist statistician. He used this 
curve to measure the distribution of wealth and income. Lorenz Curve 
is a percentage cumulative curve in which the percentage of items 
under review is combined with the percentage of other things as 
wealth, profits, etc. 


For drawing the curve, size of the items and frequencies are both 
cumulated, and their percentages are calculated for the cumulated 
values. These percentages are to be plotted in the graph paper. If 
wealth are equally distributed among the people concerned, the curve 
would be a straight line joining the extremes of the different scales. 
This line is known Line of equal distribution. If again the distribution 
is not proportionately equal, it indicates variability, and the curve 
would be away from the line of equal distribution. The further the 
curve is away from this line, the greater is the variability. Lorenz 
Curve does not yield a numerical measure. 


Example. 


In the table given below is given the number of companies 
belonging to two area A and B according to the amount of profits 
earned by them. Draw in the same diagram their Lorenz Curves and 
interpret them. 


Profits earned in No. of Companies 
Bs. ‘000 
Area A| Area B 

6 6 2 
25 11 38 
60 13 52 
84 14 28 
105 15 38 
150 17 26 
170 10 12 
400 14 4 


[I. ©. W. A. 1964] 
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Calculation for drawing the Lorenz Curve 


Cumulative 
percentage 
Cumulative 
number 
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The curve B is of greater inequality, since it is furthest from the 
line of equal distribution. 


EXERCISE 7 


1. Explain the term Dispersion. What purpose does a measure 
of dispersion serve? Distinguish between absolute and relative 


measure of dispersion. { Poona, B.Com. 1966 ] 
2. Define Mean Deviation. How does it differ from Standard 
Deviation ? [G. A. Noy. 1968] [1.C.W.A. Jan. 1964 ] 


8. Explain why Standard Deviation is regarded as superior to 
the other measures of dispersion. What is its chief defect ? 

4. What are quartiles? How are they used for measuring — 
dispersion ? 

5. What is coefficient of variation ? What purpose does it serve ? 
Also distinguish between ‘variance’ and ‘coefficient of variation’. 

6. What is Lorenz Curve? How is it drawn? In what way 
does it help in studying variation of two or more distributions ? 
Illustrate with the help of an example. 

7. If each term is reduced by 10, what effect would this have 
on (i) the Arithmetic Mean, (ii) the Range, (iii) the Standard Devia- 
tion ? [C.A. May 1964 ] 

( Ans. (i) reduced by 10 ; (ii) & (iii) no change ) 

8. The weight of 11 forty-year old men were 148, 154, 158, 

_ 160, 161, 162, 166, 170, 182, 195 and 286 pounds. If the haviest man 
is omitted, what is the percentage change in the range ? ; A 
[ 0. U. M.Com. 1968] (Ans. 46°6) 

9. From the following table, compute the Quartile Deviation : 


CE EE a 
Size : 4—8 8—12 12—16 16—20 20—24 24—28 298—382 32-36 36—40 
Freq.: 6 10 18 30 15 12 10 6 2 


nN Ecalona mn oneo seb. DE eee TT Ray See 
[G. A. May 1965] (Ans. 52) 


10. The following table gives the monthly wages of 72 workers 
in a factory. Compute the quartile deviation : 


1D 8 TOC t eee ee 

Monthly wages (Rs.) : 12°5—175 17'5—29°5 22°5—27'5 275—32'5 

No. of workers + 2 22 19 14 

32°5—87°5 37°5—42'5 42°5—47°5 475—525 52°5—57'5 
3 4 6 1 1 


[C. U. M. Com. 1962] ( Ans. Rs. 5°15 ) 
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: tn The following table shows the distribution of the maximum 
loads supported by certain cables produced by a company : 


Maximum load 4 
(short tons) ; a ae ed ee 
No. of cables ; 2 5 12 17 14 


ee oe 


a... 
at diadu ss) 


—Find the semi-interquartile range for the above distribution, 
(Ans. ‘50 short tons ) 


12. Find the mean absolute deviation of the following ‘observa- 
tions: 2, 4, 9, 16, 90, 10, 14, 18, 8, 10, (Ans, 4°12) 


13. Wind the mean deviation of the following ser: 
pc a ie Nil i Cae 
2: 10 11 12 18 14 Total 
fi 8 12°18 19°38 48 
SAS sn Sel 
[C. A. May 1963 ] (Ans. 0°75 ) 


ies : 


14. Calculate the mean de 


Viation for the following frequency 
distribution ; - 


No. of colds experienced No. of 
in 12 months persons 


No. of colds experienced 
in 12 months 


No. of 
persons 


‘ [L.C.W.A, Jan, 1963] (Ans, 1466 ) 

15. You are given the frequency 
factory according to th 
Locate median and quartiles, 
and hence coefficient of dispersion, 


distribution of 291 workers of a 
eir average monthly income in 1954-55. 
Find also mean deviation about median 
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Sanne group (Rs.)| No. of workers 


Income group (Rs.)| No. of workers 


Below 50 150—170 22 
50—70 170—190 15 
70—90 190—210 15 
90—110 210—230 9 

110—130 930 and above 10 
130—150 


(Ans. Med.=120°5 ; Q: =95°78 ; Qs =149°24 ; M.D.=88'11; 
coeff. disp. = "73 ) 


16. Calculate the standard deviation from the following table : 


Days: LSE SWB TTS OOF aise Sian) Seo 
Daily Harn. : 1°50 1°00 1'25 2°25 2°00 2°50 3:00 1°50 3°00 2°00 

(Rs.) 
(Ans. Rs. 0°66 ) 


17. Find the standard deviation of the following distribution : 
we: 45 14°65 24°5 345 44°5 545 64°5 
af: 1 5 12 22 17 9 4 
[ Delhi, B.A. (Hons.) 1969] ( Ans. 13°25 ) 
18. Calculate the standard deviation from the following data, : 


Temp. ‘0’ No. of days Temp. ‘CO’ No. of days 
-40 to 30 10 0 to 10 65 
— 30 to 20 28 10 to 20 180 
— 20 to 10 30 20 to 30 10 
-10to 0 42 


[C. A. May 1966] ( Ans, 14°78°O ) 
19. The following is the record of goals scored by team A in a 
football season : 
No. of goals scored: 0 1 2 3 4 
No. of matches points b 9 7 5 3 
For team B; the average number of goals scored per match was 
2°5 with a standard deviation of 1'25 goals. 


Find which team may be considered more consistent. 
[ C. A. Nov. 1963] (Ans. team B ) 
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20. The following frequency distributions have been cons 

tructed from measurements of heights (in inches) and weights (in Ib.) 

~ of the same group of adult persons. Which is more variable, height or 
weight ? 


Mid-point (ht.) Freq. Mid-point (weight) Freq. 
60 10 84 10 
62 10 94 10 
64 30 104 15 
66 30 114 20 
68 15 124 15 
70 5 134 10 
100 144 10 
154 10 
100 


[1.0.W.A. July 1968 ] ( Ans. weight ) 


21, The number of runs scored by cricketers A’and B during & 
test series consisting of 5 test matches is shown below for each of the } 
10 innings : 

TT 
Cricketer A: 5 26 97 76112 89 6108 94 16 


Cricketer B: 51 47 86 60 58 39 44 42 71 50 


Make a comparative study of their batting performance. : 
(Ans. Cricketer B is more consistent scorer ) 
22, The scores of two batsmen A and B, inter innings during a 
certain season are as under : 
Ae 82 28 47 63 ts 39 10 60 96 14 
B 19 31 48 53 67 90 10 62 40 80 


—Find which of the batsmen is more consistent in scoring. 
[ I. 0. W. A. Jan. 1970] ( Ans. Batsman B ) ‘ 


23. From the data given below, state which series is more 
variable (use standard deviation) : 


Variable Series A Series B 
10—20 10 18 
20—30 18 22 
380—40 32 40 

s 40—50 40 82 
60—60 22 18 
60—70 ‘ 18 10 


(0. A. Noy. 1971] (Ans. Series B) 
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24. The following table gives the distribution of wages in the 
two branches of an industrial concern : 


Monthly wages (Rs.) No. of Workers 
Branch A Branch B 


100—150 167 63 
150—200 207 93 
200—250 253 157 
250—300 205 105 
300—350 168 82 

1000 500 


Find out the arithmetic mean and the standard deviation for the 
two branches separately ; state 


(i) which branch pays higher average wages per month ; 
(ii) which branch has greater variability in wages relative to the 
average wages ; and 
(iii) what is the average monthly wages for the concern as a 
whole. 
[ 0. A. May 1964] (Ans. (i) B; (ii) A; (ili) Bs. 22750 } 
25. For a set of 100 observations, the sum of deviations from 


4 om. is —11 cm, and the sum of the squares of those deviations is 257 
Square cm. Find the coefficient of variation. 


[1. 0. W. A. Jan, 1967] (Ans. 41°18 ) 
96. (i) If the first quartile is 118 and semi-interquartile range is 


12, find the third quartile. (Ans. 142 ) 
(ii) The coefficient of variation is 25 and mean is 20; find the 
standard deviation. (Ans. 5 } 


97. Given the following results relating to two groups containing 
20 and 80 observations, calculate the coefficient of variation of all of 
the 50 observations by combining both the groups : 


Groups 
Lin OB 
Sa 465 55 
Sax? 118 182 [1.C. W. A. Jan. 1968] (Ans. 50) 
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28. The mean and 8. D. calculated from 20 observations are 15 
and 10 respectively. If an additional observation 5, left out through 
oversight, be included in the calculations, find the corrected mean 
and 8. D. [I. 0. W. A. Jan. 1969] (Ans. 14°52; 999) 

29. The mean income per month of a friendly society of 25. 
members is Rs. 350 and the standard deviation is Rs. 50. Five more 
members are admitted to the society and their incomes in Rs. per: 
month are 260, 300, 320, 490 and 590. Find the mean and standard 
deviation of income for the new group of 30 members. 


[I. ©. W. A. July 1969] ( Ans. Rs, 857 ; Re. 70°65 ) 


30. Given below the frequency distribution of the marks obtained 
by 90 students. Compute the A.M., Median, Mode and 8.D. 


Marks:  20—29|30—89/40—49150—59|60—69|70—79 s0—89|90 99 


No. of 
Students : 


[I. 0. W. A. June, ’76 ] ( Ans. 56°5 ; 56 ; 56°64 ; 17°6 marks ) 
31. (a) Define ‘Standard Deviation’ of ungrouped and grouped data. 
Discuss its merits and demerits as a measure of dispersion. 


(0) Caleutate the 8.D. of the following’ observations on a certain 
variable : ; 
ieee OS ah 


240°12, 240'13, 240°15, - 240'19, 240°17, 
240'15, 240°17, 240°16, 240°22, 240°21. | 
[1. 0. W. A. June 1976] ( Ans. (0303) 

32, Acompany has three establishments Ei, Ee, Eg in three 


cities, Analysis of the monthly salaries paid to the employees in the 
three establishments is given below : 


E, | Ee | Es 

No. of employees 20] 25} 40 
Average monthly salaries 

(Bs.) 305 | 300 | 340 


Standard dev. of monthly 
salaries (Rs.) 50} 40} 45 


Find the average and the Standard deviati 
salaries of all the 85 employees in the aR aes oe oe 


LL. 0, W. A. Dec. 1976] ( Ans, Rs. 320: Ra. 48°68 ) 
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83. Explain with suitable example the term ‘dispersion’. 
Mention some common measures of dispersion and describe the one 
which you think to be;the most important of them. 

[ I. 0. W. A. Dec. 1976 ] 

84. The table below gives the frequency distribution of weights 
of 80 apples selected at random from a big consignment : 


bios 110—119 | 120—129 | 180—189 | 140—149 | 150—159 | 160—169| 170—179 | 180—189 
gm. : 


(a); Draw the cumulative frequency diagram and hence determine 
the median weight of an apple. 
(b) Find the coefficient of variation for this distribution. 
[ I. 0. W. A. Dec. 1976 ] (Ans. 147°5 gm. ; 11°75% ) 
85. An analysis of the monthly wages paid to workers in two 
firms A and B, belonging to the same industry, gives the following 
results.: 


Firm A Firm B 
Number of wage-earners : 550 650 
Average monthly wages : Rs. 50 Rs. 45 


8.D. of the distributions 
of wages : Rs. /90 Rs. ./120 


Answer to the following questions with proper justifications : 
(a) Which. Firm A or B pays out larger amount as monthly 
wages ? 
(b) In which Firm, A or B is there greater variability in individual 
wages ? 
(c) What are the measures of (i) average monthly wages and 
(ii) standard deviation in the distribution of individual wages of all 
workers in the two firms taken together ? (1. 0. W. A. June ’77 } 
( Ans. (a) B ; (6) B; (c) Rs. 47°29 ; Bs. 10°64 ) 
86. (a) Calculate mean deviation from the median from the 
following : 


Class-intervals: 2—4 | 4—6 | 6—8 | 8—10 


Frequencies : 3 4 2 1 
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‘ (b) Calculate| the standard deviation of the following distribution : 


Age (x) : 20—25 | 25—30 | 80—85 |'35—40 | 40—45 | 45—50 


No. of 
persons: 170 110 | 80 45 40 35 


[IL ©. W. A. Dec. 1977] ( Ans. (a) 1°4 ; (b) 7°94) 


37. (a) The mean and s.d. of 20 items is found to be 10 and2 
respectively. At the time of checking it was found that oie item was 
incorrect. Calculate the mean and s.d. if 

(i) the wrong item is omitted, and 
(ii) it is replaced by 12. 


(6) The means of two samples of sizes 50 and 100 respectively 
are 54'1 and 50'3 and the standard deviations are 8 and 7. Obtain — 
the s.d. of the sample of size 150 obtained by combining the two ~ 
samples. [I. ©. W.A. Dec. 77] 

( Ans. (a) (i) 10°11, 1°96 ; (ii) 10°2, 1°99. \ 
(b) 51°57, 7°56. ) } 


38. (a) If the first quartile is 149 and the semi-interquartile range | 


is 18, find the median (assuming the distribution to be symmetrical — 
about mean or median). 


(6) Prove that the standard deviation is independent of any 
change of origin but is dependent on the change of scale. 


| 
se (c) Compute the arithmetic mean, standard deviation and the 4 
mean deviation about the mean for the following data : 


4—5 | 6—7 | 
8 3 60 


LE. C. W. A. Dec. 1978] ( Ans. (a) 160, (c) 9288 ; 9°476 ; 90311) 


Scores ; 


8—9 | 10—11 
20 


12—13 [14—16 Total 


4 10 


39. Calculate the variance of the following distribution (correct 
up to 3 places after decimal), Stating any necessary assumptions : 
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Height in inches No. of men ( hundreds ) 


under 62 6 

62 and under 63 10 
63 » » 64 10 
64 » » 65 40 
65 » » 66 72 
66 >» » 67 78 
67 » » 68 90 
68 » » 69 88 
69 and over 56 
450 


[ 1.0.W.A. June 1979 ] (Ans. 3'348 ) 
40. The means of two samples of sizes 50 and 100 respectively 
are 54°4 and 58'5 and the standard deviations are9and11. Obtain 
the mean and standard deviation of the sample of size 150 obtained by 
combining the two samples. [T.0. W. A. June 1979 J 
( Ans, 57°18 ; 10°56 ) 
41. For a group of 50 boys the mean score and the standard 
deviation of scores on a test are 59°5 and 8°38 respectively. For a 
group of 40 girls the same results are 54'0 and 8'23 respectively. 
Find the mean and the standard deviation of the combined group of 
90 children. { C. U. B, Com. (Hons.) 1980] (Ans. : 57°06 ; 8°75 ) 
42. The first of the two samples has 100 items with mean 0°23 
and §.D. 561. If the second sample has 75 items with mean 22°04 
and §.D. 1°84 find the mean and variance ( square of 8.D.) of the sample 
obtained by combining the two. { I.0.W.A. Dec. 1979 ] 
( Ans. 9°38 ; 185°20) > 
43. The means of two samples of sizes 50 and 100 respectively 
are 54:1 and 50°3 and the S.D. are 8 and 7. Find theS.D. of the 
sample of size 150 obtained by combining the two samples. 
[I.C.W.A. June 1980] ° ( Ans. 51°57 ; 7°56) 
44. Goals scored by teams A and B in a football match were 
as follows : i 


No. of goals scored ina match Number of matches 


A B 
tt) 26 18 
ae 10 8 
2 vs 5 
3 6 6 
4 4 3 


—Caloulate the mean and the §.D. in each case. 
[I.0.W.A. Dec. 1979] (Ans. For A: 1'l, 1°83; For B:1'9, 1°35) 
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45. Calculate mean and 8.D. of the following data : 


Age 
Under 10 


Sssessegs 


No. of persons dying 


15 
30 
53 
8, 15 
100 
110 
“115 
125° 


- [LO.W.A. June 1980] (Ans. 35°16 yrs. ; 19°76 yrs.) 


MOMENTS, SKEWNESS AND KURTOSIS 


(A) Moments. 


For n observations £1, 22, @s, +++) Zn, the arithmetic mean of 
the rth power of deviations taken from an arbitrary constants A, is 
defined as 1th moment about A (denoted by m,). 


So, my= > d(aj,— A)” 
ie hy = * [ler AY + (ea — AY + (a AY oo + (On YT 


For r=1, m= a =(a;— A), is known as 1st moment about A 


n 
r=2,.me= x 3s(ai-A)?, oe ond oe wk 
f=8. mas L spa); co peeellBrd ee 


r=4, men > Bla), Bye CHE dioeare SH ee til 


and go on, Now it may be noticed (from the 1st moment about A) 
ms =23(a;,—A)= 3(B0i- ZA)=5 (Sei-nA)-e—A 
iue., the 1st moment about A=@-A (2) 


Cases. 


_ Now (i) when A=0, we find moments about zero. These are 
called Raw Moments. So the rth raw moment is defined as 


Seer is : 
For r=1, m= es = (arithmetic mean) 


We also find this putting A=0 in'(2) 
So, the Ist raw moment is the arithmetic mean. 


Bus. Stat.—15 
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(ii) When A=z, we find moments about mean. These are 
known as Central Moments. So rth central moment is defined by 


1 = 
my! = 7 Slai- x)” 


[dash (‘) is used to distinguish from other moment. | 


Now for r=1,m,'= > 3(¢;-2)=2=0 as 3(2;-a)=0 
r=2, mo’ =  s(a- 3) = 0" (square of s.d.) 


_. So we get that 1st central moment is always zero and 2nd 
central moment is variance o* so that o= s/ma’ (s.d. is the square 
root of 2nd central moment). 


Note. The 8rd central moment (m,') is used to measure skewness and the 
4th central moment (m,’) to measure kurtosis. Higher order of moments are of 
little use, 


Example. 
For the numbers 2, 4, 6, 8, find the first four moments about 4. 


Calculation of Moments 
PSEC NSS a SRR ee CU th ec Nt deh 


x e-4 (w— 4)? (e-4)* (a—4)* 

2 -9 RE Ray iia aa 

4 0 0 0 0 

6 2 4 8 16 

8 4 16 64 256 
Total 4 Paes 64 988 


 ™ 


= 2(e=4)_ 4 Zle-4)? 94 
Eat oman BOSE ag, 


=2(@-4)° _ 64 S(a=4)* 
Ms  eerdiin SPARTA IL ng = 2A 288 79, 


\ 
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Example. 


For the numbers 2, 4, 6, 8, find the first four central moments. 


Calculation of Moments 


(w-5)* 


; S020 
2(=A.M.) ye 5 


1308) 0 5p re 305)" 20 
4 n pits Ms n 4 5, 


1 3(@=5)* 0 _ 1 D(w—5)* _ 164 
5 4 0, ma 5 v7 41, 


ms 


Moments from Frequency Distributions. 


For the observations 1, @o,‘"'*',®n if fa, fo," :Jn be the 
respective frequencies, then the rth moment about A (arbitrary 
constant) is, 

My = 2fi(as—A)", where Bf =N. Eee 


Note. For a grouped frequency distributions, the mid-yalues of different 
classes represent the observations ©1y 0) ©, °**s Tn 


Now for A=0, we find rth raw moment (or moments about zero), 


Mr Sfi(as)” ; where Sf=N. 


For r=1, Mi= x Df; v; =X (weighted arithmetic mean). 


So we again find that the 1st raw moment is AM. 
Again for A= X, we get rth central moment. 


Mr'=5 Sf(ai- X)", where x-38; Sf=N. 
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For r=1, Mi=2 sf (ax A)= [2404-274] [from (i) 

=X- A aeX- A (a8 sh-N) 

If a ©, where ¢ and d are constants, then the rth central 
moment of variate «is equal to d” times the rth central moment 
of new variate y. Thus { 

Mi =d" (y) 

As the values of new variate y are small, : 

MM’ may be caleulated from the raw moments of y..) Now M're 
can be calculated multiplying the values of M’xy) by a’. 

For 1st central moment, i.e., for A=X. 

We find M,’=X-X=0 

r= 2,~ Ma'= x Shi (ti-X)* = 07, 

Thus 2nd central moment is the square of standard deviation and 
is called the variance. fi 

So we find that the 1st central moment is always zero, and 2nd 
central moment is the variance. 


Example, 
Find the first four central moments of the following data : 
Os Dewy 5 6 


fs 8 2 2 3.) 


Calculation of Moments 


e hs i a4 | fle@=4) | fle@=4)"| fa=4)® | fle-4)* 
2 8 6 -2 -6 | 19 = 94 48 
MORE ucaeal ce Si) 24 2 = 3 2 
5 2 10 ie 2 oe 2 2 
13 
6 8 | 38 go Ba a 4 | 48 
Total | 10 | 40 0 0 98 0 100 
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x L Sfm 40 
x Sf 10 4, 


)_Sfle—4)_ 0 _ | Sfe—4)?_ 28 _ 9. 
MTN LO me ae Ny) donde 


,_Bfle-4)* _ 0 


no, myn BAe at 100, 
N 10 0, Mi N 10 10. 


Ms 


Effect of Change of Origin on Moments. 
Let the moments about A, i.¢., my= > d(aj- A)" is'given, and 


we are to find moments about B, using the moments given. 
The moments about B will be M,= b (a; - By’. 


Now ai-B=(#i—A)-(B-A)=(%:-A)-d, - 
where B-A=d, 
or, (a—B)"={(e;—A)— a}, 
making rth power on both sides 
= (a; — A)" —"ea(anj— AY~*.d + "cala — A) Pd® =", 
+(-1)".d" (using binomial expansion) 
or, 3(wi- BY = S(ai— A)" —"crdd (ai — AY 
4% cgd? Swi — A)'-2- ++ +(- 1) Sa” 
(now taking aggregate (=) and multiplying by 2 on both sides) 
or, ty (a; -B)" = 2 3la-A)"- "oid 2 S(e;- AY 
2 1 r—2 —-1)" 1 
+ "cad eae A)’-2---- +(-1) = “nd” 
or, Mp=mp—"C1dmy_a+ "cad? my—2 — “4+ (-1)'a" 
For r=1, Mi=m,:-d, here d=B-A 
r=9, ‘My =me-2dm,+d* 
r=3, Ms =ms—3dm2+3d?m3—-d* 
r=4, Mae=ms—4dmg + 6d*m2-4d°ms + d*. 


Thus we find that the moments about B(M,) can be expressed as 
the moments about A(m,) by the above formulae, where d=B-A. 
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Relation between Central Moments and Non-central Moments. 


(a) Bapression of Central Moments in terms of Non-central moments, . 
Central moment m,’= 1 3(n;— x)" x 


Non-central moment about A (say) 
m= & 3m AY 

Now 2-—2=(2;— A)-(@-A)=(a;-A)-d, 
where d=z-A 
or, (2j;—2)” ={(a;- A)—a}* 

= (a5 A)” —"cy(a¢ — A)"“*. + "09 (org— A202 — 

en, +(-1).d" 
- or, 3(a;— 2)" = B(ay— A)’ — "cd 5(a,— A)r-* 
+7 cad? B(x; — A)P~2 = ves. +(=1)? 20? 


or, L (x; - a)" = * s(e: A)" ="oid L D(a; - A)"-2 


+ cad? 2 s(e;-Ayr-*= te (- ut sd" 


or, My. = The — "erdmra1 +7cod*my—» Taeeese sh (— 1)?a". 
Putting r=1, m:’=m,—d, here d=z-A, 
ms'=ma—Imd+ d?, 
ms =m — 3med+ 3m,d?-d3, 
ma'=ma- 4mgd+ 6meod? = 4m,d° +d*, 
Again we know mz (1st moment about A)=z-A. 
So putting m1 =d, 
We find my’=m,-m,=0 : 
ma’ = m2 — 2mm +mi*=m,- m2 
ms'= ms "cy 3mem ar 2m. 
ma! = m4 — mgm + 6mam,? — 3m‘. 


LA @) Expression of Non-central. Moments in terms of Centrat 


(w¢— A)" =(:-2)+(@-A)=(2,-2)+d, where d=2-A 
or, . (a4 A)” = (04-2) +%ca(m,-Z)"-2.d 
+ C9(p= a)" 2G? ++. + Gr, 
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or, S(ei— A)" = S(ae— 2)" +71 dD — 2)"* 


+7 0nd? S(ai— a)" tv +3d" 
Al 1 = 1 x 
or, 72 (a; - A)” Pig 3(aj- a)" + "eid rs 3(a:-2)"~* 
+ Tosd? 1 x(ej— BYP boon tS Ba 
Of, Mp = My’ +" Crdmy—a' +Cad mp — at FE 


Taking 1+=1, 2, 3, 4, we find respectively 

mi=m'+d, here d=a2—-A 

ma = m'+2dms' +a* 

ms = me! + Bdma’ + 3d*my' + d* 

ma = mal + 4dmg! + 602mg’ +4d°® my’ + d* 

Since mz’ (1st central moment) =0, d=¢—A=m 

mi =0+d=m, 

ma =m +m? 

ms = Ms) + 8ma'ms1 + mi° 

m= ma! + Ams'mi+ 6me'ms? +m1*. 


Worked out Examples, 


(1) The first 8 moments of a distribution about the value 7, 
calculated from a set of 9 observations are 0°92, 19'4and -41'0. Find 
the measures of central tendency and dispersion, and also the third 
moment about origin. [I. G, W. A. Dec. ’75 ] 

Here m1 =0'2, mg =19'4, ms= — 41'0 about A=7. The measures 
of central tendency and dispersion indicate mean and §.D. (a) 


we know, mi=a-A ny 
or, O8=2-7 
or, @ (mean)=7+0°2=72, 
or, o (s.d.) is the square root of 9nd central moment (me’) 
we know, m/a=ma-2dm;+4*, when d=z-A=T2-7=02 
=19°4 — 2(0'2)(0°2) + (0'2)? = 19'4- 0°08 + 0'04 
=19°36 : 
ce om J1936=4'4. 
Taking Ms is the 8rd moment about origin 
Mg=ms —3dma+ 3d2m,—d°, here d=0-A=—7, 
= —41-3(-7) 194+ 3(- 7)2(0'2)-(- 7)° 
= —41+407'4+29'4+ 343 = 738'8. 
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{ Alternative Way ] 


2 sai -1)=0°9 or, + Sei-7=0'2 or, 8=7+0°9-79 


Again x S(ai-1)?=19'4 or, 3(ei® — 142i +49) = 19'4 


or, Lait - 147 seit+49=19'4 


or, * sei? —147+49=19'4 
or, i Sri? =19'4 +14 x 7'9- 49 =19'4 + 100'8 49 =71'9 
ota tei)? ~ @P=71'2+ (7'9)" =19'36, 
o= J/1936=4'4, 
Next 4 xe, -7=-41'0 


or, 3 Blee? — 8049.7 + 8.04.79 — 343) — 4 41°0 


or, 2 yrit—gi.1 Bei? +147} xpi —343~ — 41 


n 
if 


or, 5 zat? ~ 21x 7494147 x 72-343 — — 41 


or, > xei* = 1495'9~1058°4 + 949 - 41=738°8, 


(2) The first four moments about the value lare 2°6, 109, 43°4 
and 192'6 respectively, Find the arithmetic mean and the first four 
moments about the value 4. 
Here m, =2'6, m2=10°2, ms =43°4; ms=199'6 and A=1, 
We know m:=5-A or, =m, +A=9'641=36, 
Now d=B-A-4-j— 8, and the required four moments are: 
Mi=m:~d=26-3= —074, 
Me=ms~- 2dm,y +d*=10'9- 2(3)(2°6) +87 =10°9~ 15°6+9=3'6, 
Ma =ms~8dme+3d°m,~4° = 43'4 ~ 3(8)(10'2) + 3.37(2°6) — 3° 
: = 434-9134 79'9-97= — 52, 
Me = my ~ Sing + 6d? ng 442 my +q4— 199'6 ~ 4(3)(43"4) 
*  #6'8(10'2) — 4(3)8(9°6) + 3¢ 
= 192°6 = 520°8 4550's = 280°8 + 81 =29'8, 
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(3) Find the first, second and third: central moments of the 
frequency distribution given below : 


Range of expenditure 
in Rs. per month No. of families 
38— 6 28 
6— 9 292 
9—12 389 
12—15 212 
15—18 59 
18—21 18 
21—24 2 


[ I. G. W. A. Inter. June 1978 ] 


Computation of Moments 


Class- Mid. pt. ; 2 —13'5 2 3 
interval 2 f Cot eHeD Sy fu fy 
8-6 45 98 =3 784%, 1: 2680-20 8B 
6="9 75 299 =92 = 584 1168 = 2336 
9—12 10°5 389 a1 — 389 389 — 389 
12—15 13°5 212 0 0 0 0 
15—18 16°5 59 1 59 59 59 
18—21 19°5 18 2 36 72, 144 
91—94 22'5 2 3 6 18 54 
Total _ 1000 = — 956 1958 — 3224 


Raw moments about y : 


N ~ 1000 
_ 3fu* _1958 _,. 
Mesoy S000 
sfy? _ -3924_ 
My = SAE = 8804 — — g-o94. 


M,= fy _ —956_ _ “956 


Central moments about ¥ : 
M’1:=Mi-Mi=0, 
Me'=Me —M,?=1°958 —(—'956)? =1°958 — "914 =1'044, 
M,'=M,—8MeMi + 2M1* =(— 8'224) — 3(1'958)( — 956) 
+2( —°956)* = — 3°294 + 5°616 — 1°7480 = 644, 
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Central moments about a : 
M'ym=dX Misa =3x0=0, 
MW’ gu) =d* X M'aqy= 9 X 1'044 =9'876, 
Mg) =d° XM’ gay = 27 x 644 = 17388. 
Now mean (2) =c+ dy =13'5 +3 x (— 956) 
=13'5 — 2'868 = 10632. 


(B) Skewness. | 


A frequency distribution is said to be ‘symmetrical’, if the 
frequencies are distributed symmetrically (or evenly) on either side of 
an average. When plotted on a graph paper, such distributions will 
show a normal or ideal curve, In anormal curve, the values of mean, 
median and mode coincide and the quartiles are equidistant from the 
median. In such cases, the sum of the deviations measured from 
mean, median or mode would be zero. A normal curve is a bell-shaped 
curve, in which the values on either side of an average are symmetrical. 
Tn general, frequency distributions are not symmetrical, they are 
slightly or highly asymmetrical. Skewness is opposite to symmetrical. 
The presence of Skewness indicates that a particular distribution is — 
not symmetrical. The word Skewness literally denotes asymmetry (or 
lack of symmetry). 


Measures of Skewness will not only show the amount of skew- 
ness, but also its direction. 


_ Adistribution is said to be positively skewed when it has a long 
tail towards the higher yalues of the variable and negatively skewed 
santa longer tail is present towards the lower values of the 
variable. : 


The following figures give us an idea about the shape of 
Symmetrical and asymmetrical curves. 


MMeMo 
(a) ) (c) 
Fig. 36 


_ _ Figure 36(a) shows the shape of an ideal symmetrical curve. It 
is Beare and the values of mean, median and mode would be 
equal, 


aa ae 
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Figure 36(b) indicates ® moderately skewed curve. In it the 
value of the mean would be greater than that of median, which would 
be also greater than mode, The curve is skewed to the right and is 
known as positively skew. 


In Figure 36(c), the value of mean would be less’ than that of 
median, which would be again Jess than mode, It is skewed to the 
left and is known as negatively skewed. 


Test of Skewness. 


(1) In a skew distribution, the values of mean, median and 
mode would not be the same. 


(2) Two quartiles would not be equidistant from the median or 
(Qs-M)—(M=-Q:) would not be zero. 


(3) ‘When plotted in a graph paper, a skew distribution would 
not show a bell-shaped curve [ as in figure 86(a) J. ; 


Measures of Skewness. 


It has been discussed earlier that the mode is not influenced by the 
presence of extreme values, the median is influenced by their position 
only and the mean is influenced by the size of the extreme values. 
The shape of a frequency distribution as such has an influence on the 
values of mean, median and mode. For # symmetrical distribution 
mean, median and mode coincide, but when the distribution is 
asymmetrical the mean and. median move away. from the mode 
towards the extreme values. Mean moves more than median, i.e., for 
an asymmetrical distribution M<Me<Mo or M>Ms > Mo. 
Consequently the distance between mean and mode, say, méan— mode 
may be used to measure skewness. 

But such a measure has the following shortcomings : 

(i) This measure, being a measure of absolute skewness is always 
in terms of the unit used in the original observation. So it is not 
possible to compare the skewness of two distributions which are in 
different units. : 

(ii) For identical skewed curves, the same amount of skewness 
haye much different meaning for a distribution of small dispersion than 


for a distribution of considerably large dispersion. 


» In order to make valid comparison between the skewness of two 
or more distributions, some measures are devised which eliminate the 
above two shortcomings. Such measures, known as coefficient of 
skewness is a relative measure of skewness obtained by dividing the 
absolute measures of skewness by some measure of dispersion. 

The most widely used measure of skewness as Pearson’s measure 


of skewness, which is given by csi ta 
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This skewness will be positive when the skewness is to the right, 
4.¢., when mean > mode, and will be negative when the skewness is to 
the left, z.¢., mean < mode. 


For most frequency distribution, it may be difficult to determine 
the position of mode, while the median may be located satisfactorily. 
So the empirical relation mean— mode =3(mean—median) is used to 
measure the skewness when the distribution is moderately assymmetri- 
eal. 


So we get S{meen= median) as another measure, 


Again skewness may be measured by considering the relative 
positions of three quartiles. For a symmetrical distribution we 
get, Me-Qi =Q,—-Me. For a positively skewed distribution 
Qs—Me>Me-Q:i,, while for negatively skewed distribution 
Qs—Me<Me-Q;. Thus (Qs—-M-)—(Me—Q:) may be taken as an 
absolute measure of skewness, while again may be put into the relative 
terms on being divided by (Qs—Me)+(Me—Qi)=Qs-Q:. So the 


relative measure is @Qs= Mel = re =Q1) Qs es get This is 


known as Bowley’s Measure. 


All the above measure of skewness has two desirable properties 
which any. measure of skewness should have. They are equal to zero, 
when the distribution is symmetrical and are all pure numbers. 


It has been found that all the measures of skewness vary between 
—land +1. 


So we find . 


(a) First measure of skewness: | 


mean — mode 
s.d. 


or, (when mode is ill-defined) — Smean—modian) 


Karl Pearson’s coefficient of skewness = 


(b) Second measure of skewness : 

Bowley’s measure of coefficient of skewness = jae 
(c) Moment measure : i 

“Coefficient of skewness = (re where ma’, ms’ are the second 


and third central moments respectively. Coefficient of skewness is 
represented by B1 (read as “Beta one’). 

he 

iy 
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' 
™s 


(ma 


It has been found that the value of Bowley’s measure lies between 
= 1 and +1. 


Some statisticians use 6 = 


Note: (i) 8, will be zero only in case of symmetrical distribution. The 
greater value of f, indicates that the distribution will be more 


curved. 


(ii) The positive value of 8, means the distribution is positively skewed. 
Again the negative value of 8; indicates the distribution is nega- 


tively skewed. 


(iii) If mean > median > mode, then the distribution is positively 
skewed. Again if mean < median < mode, then the distribution 


is negatively skewed. 


Example. 
Comment. on the following results of averages of any distribution : 


(i) A.M. is 10, median is 11. 
(ii) A.M. is 15, median is 12. 
(iii) Mode is 11, median is 18. 
(iy) Median is 10, A.M. is 14. 
(vy) Median is 12, Mode is 13. 
~ (i) AVM. (10) < median (12), 
the distribution is negatively skewed. 
(ii) A.M, (15) > median (12), 
the distribution is positively skewed. 
(iii), Median (13) > mode (11), 
the distribution is positively skewed. 


(iy) A.M (14) > median (10), 
the distribution is positively skewed. 


(v) Median (12) < mode (18), 
the distribution is negatively skewed. 


Example. 
Compute Karl Pearson’s coefficient of skewness from the 


following data : 
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variable Frequency | variable frequency 
20°5—23°5 17 29'5—32'°5 194 
23°5—26'5 193 82°5 —35°5 a7 
26°5—29'5 399 85°5—38'5 10 


{ Delhi, B. Com. 1953 ] 


Calculation of coefficient of skewness 


Variable | Mid. pt. sd ' 
= f d a= 3 fa’ fa? 


—— a 
205935 |. 22. | 17.|'-6.| -2 | -84] 68 
93'5—26'5 | 25 |193 |-8 | -1 | 193 | 198 
a65—29'5 | 28 | 399 | 0 0 0}. 0 
99'6—39'5 | st | 1948 1 | 194 | 194 
305855 | 84 | a7 |. .6 2 54 | 108 
9 3 30 | 90 


35°5—38'5 37 10 


Total — 840 


Formula for required coefficient of skewness : 


mean — mode 
s.d,(a) 


/ 
Now,’*inean!= A4 a xi=98+ 2h 3 =98 +199 98'189, 


Mode lies in the class (26°5—29'5), 
1=26'5, fo =193, f:=399, fo=194,1=3 


: 399 - 193 
Mode =26°5 + 55-399 -193 -194 *® 
e208 
265+ 208 xs 


=26°5 +1°5 = 28, 
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12 G 
8.d. (0)= He a a ‘| xi 
_. (658 _[ 1)" 
Ve 480, } chi 
= "8784 x 8 = 26352 = 2°64 (calculation by log table) 


mean — mode 


_ Now, coefficient of skewness = ad 


_ 28°18 — 98 
2°64 
ct teas 
a64~ 068. 
Example. 


‘ For a moderately skewed data A.M.=100, coefficient of variation 
=85, Karl Pearson's coefficient of skewness=0'2, find mode and 
median. 


ce §.D. 
coefficient of variation A, *100 


8.D. 
or, 35 100 x 100 


“.108.D. = 35. 


Karl Pearson's coefficient of skewness = A.M. — Mode 


+9 _ 100 - mode 
or, 0'2 35 


or, 100-—mode=35 x 0'2 
x or, 100-—mode=7°0 
or, mode=100-—7=93. 
We know, A.M. —mode=8(A.M. — median) ’ 
or, 100-93=3(100- median) 
or, 7=300-3 median 


or, 8 median=293 


sey median’ 208 =9767. 
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Example. i 
The second and third central moments of four numbers are 5 
and 0 respectively. Find the coefficient of skewnes (81) by moment 
measure. 
_(mmal® _ 0. 
By (ma')® 95 0 
Here f1 is zero, it means that the distribution is symmetrical. 


(C) Kurtosis. 

The expression Kurtosis indicates, whether a particular distri- 
‘ bution is more flat-topped or more 
LEPTOKURTIC peaked than the normal distribution. 
The normal curye (or bell-shaped curve) 
is called Mesokurtic. The curve which 
MESOKURTIC lis more | flat-topped than’ the normal 
curve is known as Platykurtic and the 
curye which is more peaked than the 
normal curye is known as Leptokurtic. 
From the given Figure 37, the idea 

Fig. 87 of the curves will be clear. 


PLATYKURTIC 


Measures of Kurtosis. 
The coefficient fa (read as ‘Beta two’) is used for the measure, 
/ 


where a= Gna" The standard value of f2 is 8. In a normal (or meso- 


kurtic) curve, Bs is 3. If B2 < 3, the curve is more flat-topped, i.¢., the 
curve is platykurtic. If again Bg > 3, the curve is leptokurtio, z.c., the 
curve is more peaked, 


It may be mentioned here that the knowledge of central moments 
18 utilised in finding out kurtosis. Kurtosis is mainly used’ in 
biological studies. 


Dispersion, Skewness and Kurtosis. 


Dispersion indicates the scatteredness of items round « central 
value. In skewness we find the extent of deviations below or above 
an average. Measures of skewness give the shape of the series and 
size of variation on either side of an average. Kurtosis studies the 
concentration of items at the central part of a series, — 


* 


q 
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Example. 
The first four central moments of a distribution are 0, 2°5, 0°7 
and 18°75. Test the skewness and kurtosis of the distribution. 
a (ms)® CI Lg) 
Coefficient of skewness (81) (ma')® ~ (9°5)® +0'081. 
Since f, is +ve, the distribution is positively skewed. For 


, 
Kurtosis, we are to calculate B2= na 
2 


Now began ® 


Since f,=8, the distribution is normal,| ie, the curve is 
mesokurtic. 


Example, 


The first four central/moments of a distribution are 0, 2°5, 0°7 
and 18°75, Examine the skewness and kurtosis of the distribution, 


(v0)? _(0'7)* = oro83. 
Here fi ‘ma’)®” (2°)* +0°031. 
As fx is positive, so the distribution is positively skewed. | 
/ o 
= me 18°75 
Again Bs (rna’)® ~ (9°)? 3. 
Since Ba is 8, i.., the distribution is normal, 80 the curve is 


mesokurtic. 


Example. 

First four central moments are 0, 6, 12, 120. Examine the 
skewness and kurtosis. 

Hore ms’=0, ma’=6, m'=19, my'=120 » 


2 2 
B1( = skewness) = ta - 14 artis +0°667 


The distribution is positively skewed. 


’ 
Again Ba(= kurtosis) = im? = ” = eo = 3°33. 


Here Ba > 3, i.e., the distribution is leptokurtic. 


Bus. Stat.—16 
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EXERCISE 8 


5 1, Define moments. , Establish the relationship between the 
moments about mean and terms of moments about any arbitrary point 
and vice versa. {L.C.W.A. Inter, June '77 (N. 8.) ] 
9. Find the first three central moments of 5, 8, 12, 14, 16. 
! (Ans. 0,16, - 22 ) 
3.’ Find the first three raw moments of 5, 8, 12, 14, 16. 
(Ans. 11, 137, 1841.) 
4, The first two moments of a distribution about the value 5 of 
the variable are 2 and 20. Find the mean and yariance. 
[LO.W.A. Inter, June ’77] (Ans. 7; 16) 
5. Find the first:four central moments of the following data : 
v2: 2 3 4 5 6 
Fl 3 % 3 1 
( Ans. 0 ; 0°933 ; 0; 2°533-) 


6, Find the first three moments above mean of the following 
data : 


@: 2 3 4 5 
fr, 3 (gia we (Ans. 0; 1°45; 0) 
7. The first four moments about 1 are 2°6, 10°2, 43°4 and 192°6 
respectively. Find A.M.and also find the first four moments about 4. 
( Ans. -8°6 ; —°4, 3°6, —5'2, 29'8 ) 
8. The first three moments about 3 are respectively 2, 10, 30. 


Find the first three raw moments. Show also that the variance of the 
distribution is 6. [I.0.W.A. Jan. 1964] (Ans. 5, 81, 201 ) 


9. The first four moments about 2 are 1, 2'5, 5°5 and 16, Find 
the first four moments about A.M. and zero. , 
(Ans. 0, 1°5, 0, 6 ; 8, 10°5, 40°5, 168 ) 
10, Comment on the following average values of a distribution : 
(i) Median is 21, mode is 12 
(ii) A.M. is 12, median is 14 
(iii) A.M. is.10, mode is 8 
(tv) Mode is 15, median is 12 
(v) Mode is 12, A.M.vis-10 
(vi) A.M. is 10, median is.10, mode is 10 
[ Ans. (i) + (ii) — (dit) + (iv) — (a) —vely skewed, (vi) symme- 
trical distribution ] 
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11.. Find, the Karl Pearson’s coefficient of skewness from the 
following table :. 
a: 10 il 12 13 14 15 
fay 2 4 10 8 5 1 


( Ans. °857) 


12, Find the Karl Pearson's coefficient of skewness of the follow- 
ing distribution table : 


Age | 10—12 | 12—14 | 14—16 |16—18|18— 20)20—-22/22—24 


No. of 
students 


4 10 | 16 30 | 20 | 14 6 
' (Ans. ‘07) 


13. Compute the Bowley’s coefficient of skewness from the 
following data : 


Marks | No. of students 
0—10 25 
10—20 15 
20—30 20 
30—40 15 
40—50 20 
50—60 30 
60—70 65 


70—80 50 
( Ans. —0°478 ) 


14. Find #6; and Bz of the data. given in question 5 and 
comment on it. (Ans, 0, symmetry ; 2'908, platykurtic ) 


15. Find f, and Be of the data given in question 9 and 
comment on it. ( Ans. 0, symmetry ; 2°67, platykurtic ) 


16. Oompute quartile deviation and coefficient of skewness given 


in the following values : 


Median = 18'8 cm., Qi: = 146 cm., Q; =25'2 om. 
( Ans. 5°3 em. ; 0'2 ) 
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17. Calculate first four moments from the following data : 
Bees 1 9 8 4 56 6 7 8 
f:5 10 16 20 25 20 15 10 5 
Also calculate the values of 6, and #2 and hence comment on the 
nature of distribution. 
[ Ans. 0;4;0; 376; 61=0 (Symmetrical) ; 
Ba < 8 (Platykurtic) ] 

18. Find the second, third and fourth central moments of the 
frequency: distribution given below. } Hence find (i) a measure of 
skewness (8,) and (ii) « measure of kurtosis (fe). 

Class-limits : 110°0—114'9  115°0—119°9  120°0—124°9 | 195°0—129'9 


Frequency : 5 15 20 85 
180°0—184°9 185°0—139°9  140°0—144'9 
(yo b ER ee 
10 10 5 


} ‘> (L.O2W.A. Tater, June ’76 | 

rena 54; 100°5; 7827; 61 ='2533 ; Bs = — "3158 ) 

19. Calculate Karl Pearson’s coefficient of skewness from the 
following data : 


Monthly salary No. of workers 
(Rs.) 
below 80 12 
” 90 30 
” 100 65 
» 110 107 
” 120 ; 157 
” 130 202° 
» 140 : 229 
” 150 230 (Ans, 0°248 ) 


20. Using moments, calculate a measure of relative skewness 
and e measure of relative kurtosis for the following distribution and 
comment : 


| Monthly wages (Rs.) No. of workers 
70 butbelow 90 8 
: 90» » 110 rr 
10 > >» 180 18 
. 130 n 9° 160 9 
150» » 170 4 


{ Ans. 0°08 ; 2°306 (platykurtic) ] 
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CORRELATION. AND. REGRESSION 


(A) Correlation. 


Meaning: In the previous chapters, we have discussed 
problems relating one variable only. We have observed how measures 
of central tendency, measures of dispersion and skewness are calculated 
for comparison and analysis. We have also seen how series are 
represented by diagrams and charts... In practice, we face a large 
number of problems involving the use of two or more variables. 


Tf two sets of variables vary in sucha way that changes of one 
set are related by changes in the other, then these sets are said to be 
correlated. For example, there is a relation between income and 
expenditure of a common family, heights and weights of a group of 
persons, rainfall and production of few commodities, age of husband 
and age of wife, marks obtained by a group of students in two different 
subjects, price and demand of a commodity, etc. It is likely that if 
the income of a common family increases, expenditure also increases 
of that family. Again in general, with the increase of height of 
person, the weight also increases. It may be mentioned here that 
the two sets of variables should be correlated or interdependent to 


each other. 


If the number of good cricket ‘players in India increases and the 
production of jute in Bangladesh increases, we cannot say that the 
phenomena under consideration are related to éach other, or there is 
any correlation in between, In other words, the variables are 


uncorrelated. - 


a Definition : Correlation means the relationship between two 
variables where with the changes in the values of one variable, the 
values of other variable also change. 


Correlation is also known as Co-variation. 


(a) Positive, Negative and Zero Correlation. 
Positive Correlation: A correlation is said to be positive, when 
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high values of one variable are accompanied by the high values of the 
other, and, that low values of one are accompanied by low values of the 
other. In positive correlation we find that the two sets of variables 
always vary in the same direction. 


Negative Correlation: In this case high values of one variable 
are accompanied by the low values of the other. On the other hand, 
if the values of two variables.change in opposite directions, then it is 
negative correlation. 


Zero Correlation: When some high values are accompanied by 
low values and others are accompanied by high values. In this case, 
the paired observations are randomly scattered. The variables are also 
known to be uncorrelated. 


Example: 
Positive Correlation Negative Correlation 
alter a Mvlaine a 
4 10 10 20 
6 14 15 18 
8 18 20 16 
10 22 95 14 
12 26 30 12 


(b) Simple, Partial and Multiple Correlation. 
The distinction is based upon the number of variables used. 


When only two variables are studied it is a simple correlation. 
When there are three or more variables for comparison it’is multiple 
or partial correlation. In multiple correlation, three or more variables 
are studied simultaneously. For example, the yield of a commodity is 
aman with the amount of rainfall and’ the amount of fertilizers 
used, 


(c) Linear and Non-linear (Curvilinear) Correlation. 


Correlation will be linear, if the variations in the values of two 
variables are in a constant ratio. 
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Example: a y 
10 50 
15 75 
20 100 
25 125 
30 150 


It is clear that the ratio of change between the two variables is 
the same, If such variables are plotted on a graph paper, we will find 
a straight line. 

Conversely, if the variations of values of the variables do not 
bear a constant ratio, we will find non-linear or curvilinear correlation. 


Difference between linear and non-linear correlation will be clear 
from the following diagrams : 


Positive Linear Correlation | Non-Linear Correlation 


Fig. 38 


Methods of Studying Correlation. 


The following are some of the important methods of studying 
correlation : 


(1) Scatter Diagram Method, ; 


(2) Karl Pearson's Coefficient of Correlation, 
(8) Bank Method. 


(1) Scatter Diagram Method: 


Scatter diagram is a special type of dot chart.. For this method 
the given data are plotted in a graph paper in form of dots. For 


248 BUSINESS STATISTICS 
each pair of # and y values, we put a dot (or point) and thus we 
obtain many dots equal in number of observations. If now these 


Positive Negative Absence of 
Correlation Correlation Correlation 


Y Y. Y 


ae (a) 6) (0) 
Fig. 89 


) 


plotted dots (or points) show some trend either upward or downward, 
then the two variables ( and y) are said to be correlated, or otherwise 
not correlated. If again the trend of the points is upward moving 
from lower left-hand corner to upper right-hand corner, then corre- 
lation is positive [y= +1).(y is coefficient of correlation }. On the other 
hand, if movement is reverse, i.¢., dots more from upper left-hand 
corner to lower right-hand corner, then correlation is negative (y= —1). 
The idea will be clear from the diagrams (Fig. 39). 

In Fig. 39(a), the values of the two variables move in the same 
direction, the correlation is positive and y=1. 

In Fig. 39(b), we find negative correlation and y=-1, as the 
values move in reverse direction. 

In Fig. 39(c), wé.do not get any trend line and hence it shows the 
absence of correlation and y=0. 


Example. 
Given the following pairs of value of the variable X and Y: 
Os 5 6 10 14 16 
a a 5 8 12 17" 90 
(a) Make a Scatter diagram. 


f (0) Do you think that there is any correlation. between the 
variables X and Y? Is it positive or negative ? 
Hints: After drawing the scatter diagram it will be seen that the correlation 
exists and it is positive, i.e, y= +1. 
Note. In the scatter diagram if the plotted points (or dots) are close to each 
other, then there would be a high degree of correlation. In such case, a straight line 
may also be made to pass through such points, : 
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Correlation Coefficient. 


In the Scatter Diagram of_fig. 39(c), let the origin be shifted 
to O' whose co-ordinates are (x, y) with respect to the original axes 
OX and OY of the rectangular co-ordinate and let the two new axes 
be O’X’ and O'Y’. Zand y are means of 2 and y variates respectively. 


Fig. 40 


Now the co-ordinates (#, y) of any point P (say) with reference to the 
original axes, will be «’ and y' with reference to the new axes OX’ 
and O'Y’ respectively when aj =a, —@ and %/=4—-y- 

The points aj’, yi, in, the Scatter Diagram are now distributed 
over the four quadrants I, IL, III and IV (see Fig. 40). 

Now, since a;' and y;/ are deviations from % and y respectively, 
then, 


(i) In the quadrant I, the values of both x’ and y’ are positive 
and so also their product «’y’ and hence Ya’'y’ is positive. 


(ii) In the quadrant II, the values of az’ are negative and those 
of y' are positive, so their product a’y' is negative and hence Sa'y! 
is negative. 

(sii) Now in the quadrant III, the values of both aw and 
are negative and hence their product a'y' is positive. S2'y' is also 
positive. 

(iv) Lastly in quadrant IV, the values of x’ are positive and 
that of y’ are negative, so that a’y' is negative and hence Sa'y! 
is negative. 

When the correlation is positive, the general tendency of the 
points is to lie in the 1st and 3rd quadrants, so that the sum of xy 
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of all points in 1st and 8rd quadrants is greater than the sum of 
z’y' of all the points in 2nd and 4th quadrants and hence the sum 
of all zy’ becomes a, positive quantity. 


_, . Similarly in case of negative correlation the concentration of the 
points in 2nd and 4th quadrants is greater than that of the points 
in the other quadrants so that the sum of all a'y’ becomes a negative 
quantity. 

Lastly, when there is no correlation, i.e, when the points are 
evenly distributed in all the four quadrants, so that the sum of all 
z'y’ will be nearly zero or equal to zero. 


Thus the sum of all a’y’, i.e, e'y'=E(e—Zz\(y—y) seems to be 
& natural measure of correlation. But there are two shortcomings 
in this measure. 


In the first case, the value of yer'y’ depends on the number of 
pairs of observations, Secondly being an absolute measure it would 
be in terms of unit in which variables are measured and also it 
depends on the variability of the variables. So the measure Sa’y’ 
=3(t—a)(y- 7) may be put into the relative terms and the above 
difficulties overcome by dividing by » and the product of standard 


diviations o; and cy. ‘The ratio is Pearson’s product moment correla- 


tion coefficient », where 


2 say! = 
yet see 
Ondy N.Ogoy 


where  =number of pairs of observations, 


oon nf Rae! 
ra gf Ba ”. 


2 sey! = : (2 —2)(y—7) is called the Covariance 


between the two variables 2 and y and is written as coy(a, ¥). 


This coy («, y) has analogy with the term variance of a Single variable. 
For by definition, 


The numerator 


var (a) = 1 vie -a)= i slo a)(a —2) 
var (o)= £ 300-9) = 1 su BW 7), 


Now, cov (e, v) => s(e~ay-7. 


CORRELATION AND REGRESSION 


So we may also write, 


___cov (a, u) on see) Gi) 


Toy var (a) svar (y) 
since o2= Jvar (a) and oy= »/yar(y). 
Again, 


cov (a, y) = = S(e-a)\y-y) 

= > Sey ay ay tay) 

= =[Say-yse-eSy+ sry) 
Seidler arb ti La 
Say sy | Pera Sy t many 


oe eee 1 pee 
Lay-ya-vytoy=— Say Ly 


1 - 1 
Sa? - ate" = E0828, 


Similarly, var (y)= £5," <9". 


The correlation coefficient may also be written as 


1 sey—zy 
n sae 
ea aig Nema TEE < (iii) 
= 2472 _ 2_~2 
nse +2" a/ Pik y 
Sey - 22.20 4 
ot iv, 
Vie TV-G) 
2a —_ Sy 
since =~’ y= 
n n 
_ nSay = Se Sy (¥) 


~TnSa2— (30) Wn Sy*— (Sy) oi 


For computational purpose, the last form is generally used. 


251 


3(a- a)* = 7 3a" - 208 +5")= 7 [Bat - S2an + 307) 


32° - 2 yaee+ * sa? = Ba*— Wy Sat Dat 
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(2) Karl Pearson’s Coefficient of Correlation: 


By this coefficient (popularly known as Personian Coefiicient 
of Correlation) we can measure the extent of relationship between two 
sets of data. If the correlation is perfect, then the coefficient is 
unity or 1. Of course, correlation may be Positive or negative 
according to its nature. If again the coefficient is zero, then there 
is no correlation. 


(a) FoR DEVIATIONS TAKEN FROM ACTUAL A.M. 
Pearsonian coefficient of correlation is found by the formula :— 


» where 


ult OES 
iS J3Sa2x Sy? 
®=X-X, ie, deviation from A.M. of X-series, 
y=Y-—Y, i.e, deviation from A.M. of Y-series. 


Example. 
Find the coefficient of correlation between X and Y. 


poe Bi Se 8M PGR eRe cO™ 10 11 
Mie 2710. 18° 16° 19" “aaab a8" 81d 


x Vee. a yy 

x Y (=K-X) (=Y-yY) 
1 4 -5 ~15 25 995 75 
2 7 -4 -12 16 144 «48 
3 10 -3 =9 9 81-27 
4 18 -2 -6 4 36012 
Baeie -1 _-3 1 9 3 
6 19 0 0 0 0 0 
7 99 1 3 1 9 3 
8 95 2 6 4 36012 
9 28 3 9 9 81 97 
10< 531 4 12 16 144 48 
1184 5 15 25 995 75 
Total 66 909 - - 110 990 330 


(=32*) (= sy?) (= zy) 


SSS ez Nase eee — 


ob 8g. Sears 200% 
Tes yore 


— 


CORRELATION AND REGRESSION 253 


Sey 330 330 _ 330 
Now = T3qax sy! 110x000" 108000 880" 


The correlation of the variables X and Y is perfectly positive. 


Example. 


From the following results, find the value of coefficient of 
correlation : 


9 a> 9 a 9 & Mi, 
3 (Xi —X)* =60; = (¥i-¥)? = 60 iz (Xi- XY - Y) = 47. 
t= = i= , 


We know, Sa? =5(X;—X)*=60, Sy? =5(Y—Y)? =60, 
Say = 5(X —X)Y—Y)=57, here n=9. 


Now from = 
we get fae = BO. Mt 


(0) FoR DEVIATIONS TAKEN FROM ASSUMED MEAN 


Tf the actual méans of the variables are in fractions, then the 
calculation by the above method will be too lengthy. So we shall use 
assumed mean for taking deviations, In this case, the following 
formula will be used. 


NZduly — SdaSdy Be 
/nSdq2—(Sdz)®~ JnSdy*— (Edy) 


where d= 254 and a=, where #, B, G and D are arbitrary 


Yay = Ydady = (vi) 


constants (C and D are positive) and m is the number. of pairs of 
observations. 


Note. For grouped frequency distribution, same formula is to be used. 
considering frequency only. 


Example. 
Find the coefficient of correlation of the following data : 
Peete a 2 3 4 5 6 7 8 on 10 
Y: 46 42 38 34 380 26 22 18 14 10 
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x Y dy dy d,* dy* duty 
=X-5 =(Y-30)/4 

1 46 -4 4 16 16 -16 
2 49 ee: 9 9 = 9 
3 38 -2 2 4 4 ces 
4 34 -1 1 1 1 -1 
5 30 0 0 (ifeweicr)s 0 
6 26 1 -1 1 1 re 
7 12 2 -2 4 4 = 4 
8 18 3 -3 9 9 -9 
9 14 4 -4 16 16 -16 
10 10 5 -5 95 (OB — 25 
Total 5 -5 85 85 — 85 


(= Bdz) (= Ddy) (= Ddz*) (= Zdy*) (Sdzdy) | 


iwollss 9 nSdady — SdsSdy 
~ Jn3de* (nde)? x Jn3Sdy? = (Edy)? 
10( = 85) - 5(- 5) 
~ (10°85 — (5)? x /10°85 = (- 5)? 
u — 850+ 25 
850= 25 x 4/850 95 
~ 825 = 895 


~/88 x ./895~ 895 ~ 1 


The variate X and Y are perfectly negatively correlated. 


£Example. 


Calculate the Pearson’s coefficient of correlation from the 
following data using 44 and 26 respectively as the origin of X and Y : 


aS 43 44 46 40 44 49 45 49°38 40 42 57 


29 31 19 ae 19 27 aT 29 41 30 26 10 
[C.A. May ’78 ] 
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x i: 5 dz dy iP dy" Andy 
=XK-44 '=Y-926 
43 29 =A 8 1 9 5B 
44 31 0 5 0 25 0 
46. 19 2 ae A 4 49 -14 
40 18 -4 -8 16 64 32 
44 19 0 =7 0 49 0 
42 Q7 <2 1 4 1 = 3 
45 27 1 1 t sl 1 
42 29 =2 3 4 9 i= 6 
38 41 -6 15 36 995 =90 
40 30 on 4 16 16 746: 
42 26 -2 0 4 0 0 
57 10 13 —16 169 256 — 208 
Total -5 -6 255 704 — 306 


(=3de) (=Zdy) (=3dy*) (=3de*) (=Zdedy) 


yee nally = Sdn dd 
Vn 3 de® =(Sdz)* ® \/n3dy? = (Zdy)* 
12(—806)-(—5)-6) _ 
12.955 —(—5)? * ./12.704-(=6) 


______=8872— 80 5h!) 18708 yes 
/3060=25 x /8448-36 4/3035 x /8419 
3702 


Tet 9" 75035 x Jada 
or, log y=log 3702 —4 log (8035)-—4 log (8412) 
= 3°5684 — 4(3°4891)— 4(3°9249) 
= 3°5684 — 1°7411 — 1°9625 = 3°5684 — 3°7036 
= —°1352= -—1+1-"1852 = —1+°8648 
=7'8648 
y =antilog 1°8648 ="7324 
y = —0°7324 = — 0°73. 


Properties of y. [ C. U. B. Com. (Hons.) 1980 ] 


(A) Correlation coefficient is a pure number, i.e., it is independent 
to the unit of measurement if variable. 
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. (B) The correlation coefficient does not depend on origin of 
reference or scale of measurement. 


Let (1, 2), (wa, V2), -.., (@n, Un) be & set of m pairs of observations 
and also h 
wate and y= 4— B 


where A, B, © and D are four arbitrary 


constants, 
i Then #=A+Cu; and y4=D+By 
“. Z=A+Cu and y=D+Be. 
, Also (a;-2)=A+Cu;—(A + Cu) =C(u;—a) 
sand (yi y)=D+Buj-(D + Be) =D(v; -9). 


Also var (x)= 2 2-2)? = 1 30? (uj=a)? 
0 %  % 


nos = 0% 2 Sui-7*=C8 var (w) 
*. n= |Clou 
Similarly _ov=|D1op, 

Again, cov (a, y) =} dai-al-y)= r lee 


=OD.* 3(us- wos 9) = CD. cov (w, 0) 


_ cov (a, y) _O.D.cov (uw, v)_ CD, 
rer Gedy © \Ol|Dioue ~ 1CLIDI 7 


Case A. When C and D are of same sign, then ©.D is positive 


and anal =1 and hence yay and 7» are equal in magnitude and 
sign. . 

Case B. When © snd D are of different sign, then C.D is 
negative and Tal = -1and hence yey and 7» are equal in magni- 
tude but opposite in sign. 


(CG) The correlation coefficient lies between —1 and +1. 


Let (a1, 91), (@2, Ya), ***, (ans Yn) be a set of m pairs of obseryar 
tions and ; 


u- = i-y 
ei ott tama’ oe 
on oy 
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=7)2 
Then Sa,;2= sae = = NOx” =N. 
On Ox 
Similarly Sy? =n 
1 a ee 
Ble mMy-9) 


ie 
Ody n 


Cand p= Dri'yi'. 


Since the sum of squares of real numbers cannot be negative, 
we have 


E sled +n)? > 0 


1 


1 1 
rr Sa? + = Zyl + = E2a;'y' > 0 


or, pital Sv Cs >0 
n n , 

or, 2+2y2>0 

or, 21+7)'> 0 

or, lt+y>0 


or, yo. 

* wile =u)? > 0 

1 1 

n n 
or, 2-27 2,0 
or, 1-7) > 0 
or, 1-y 20 
or, y<l. 

Thus the correlation coefficient must lie between — 1 and +1. 


Again 


Shi = # rai'y) > 0 


Sai'? + n> 


(8) Rank Method ( Rank Correlation). 

There are somé attributes (intelligence, honesty, character, 
morality, leadership, etc.) which cannot be measured by quantity. In 
such cases individuals in the group can be arranged in order and hence 
obtaining for each individual a number indicating the rank in the 
group. 

Suppose the values of a variable (weight in kg.) are 50, 53, 54, 
47, 59. If these figures are arranged in descending order, the figure 59 
would receive the 1st rank, 54—2nd, 583—3rd, 50—4th, 49—8th rank. 
The rank of the yariable whose yalue is highest is 1 and so on, 


Bus. Stat.—17 
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Tf again there are two or more items having thesame value, then the 
process of distributing rank is as follows. 


Let two items have equal value and their rank is 4. Now the 
two items will be given average rank of the ranks which they would 
get had there been slight difference in-values. So the average rank 
would be 434=4°5 and the rank of the next item would be 6 (and not 5). 


Note. Rank may be assigned either in ascending or in descending order. 


Now the process of calculating the coefficient of correlation (y) is 
as follows : 

(i) Assign ranks to various items of the two series (if it is nob 

given) 

(ii) Find differences of the ranks (d) 

(ii) Square these differences (d*) 

(iii) Use the following formula for finding (y) : 
6(5d*) 


n> =n 


y=l- » where n =number of pairs of observations. 


This method was developed by C. 2. Spearman, British — 
Psychologist, in 1904. 


The value of this coefficient ranges between +1 and —1. It i 
v= +1, there is complete agreement in the order of ranks and the 
ranks are in the same direction. Again if y=—-1, there is complete 
agreement in the order of ranks and they ate in opposite directions, 


In rank correlation we have two types of problems : 


(2) Where actwal ranks are given. 
(0) Where ranks are not given. 


(a) WHERE ACTUAL RANKS ARE GIVEN.” 
Example. 


: Ten competitors in a voice contest are ranked by three judges — 
in the following order : 


RS ep nn Oe 2 a act PE I 
1st Judge: BNE Gadreg 3 4 
2nd Judge: SO Bae eae ae 
8rd Judge: 6 4.9 1 3 


Use the method of rank-corrélation to judge which pair of judges 
have the néarest approach to common liking in voice. © 
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Ranks given by differences Squares of 
(a) differences (d*) 
1st Qnd 8rd 
Judge, Judge Judge) (i) (ii) ii) |G) Gi) ii) 
A 3 =o -38 5 4 9 25 
6 5 4 1 1 2 1 ype ici 
5 8 9 -3 =A -4 9 1 16 
10 4 8 6 =4 2 36 16 AK 
3 7 1 -4 6 2 16 36 4 
2 10 9 -8 8 0 64 64 0 
4 2 3 2 mab ib 4 1 1 
9+ 1 10 8 = 9) =1 64 81 1 
7 6 1 1 2 2 1 4 
8 9 7 ealil 2 at 1 4 1 
200 214 60 


6(=d*) _ _ 6.200 1200 


nm? —n > 10°-10 + 1000-10 


(For 1st & 2nd judgment) 


(i) yrg=1- 


=1- ta - 1'213 = — 0'213. 
i 1 _ S(3d*)_, _ 6.914 | _ 1984 
(i). ya0 "1 ~ 8 q ~1~ i98— 10-1 — 990 


(For 2nd & 3rd judgment) 
=1-1:297= -—0'297. 
s(zd*)_, 660 _ 
1000-10 
(For 1st & 3rd pubes 
=1-"364= +0'636. 


360 
990 


(iii) n12-1-Fs patie 


The results of these coefficients indicate that the first and third 
judges have the nearest approach to common liking in voice. 
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~-(b) WHERE RANKS ARE NOT GIVEN. 


Example. 


The following are the marks obtained by 8 students in English — 
and Bengali papers. Compute rank coefficient of correlation. 


Marksin English: 15 90 28 12 :40 60 20 80 
Marks in Bengali: 40 30 50 30 20 10 30 60 


Computation of Rank Correlation 


Marks in rank Marks in rank difference a’ 
English Bengali d 
(X) (Y) 
15 2 40 6 =4 16 6m 
20 3°5 30 4 -"6 “95 
28 5 50 7 = 2 4 
12 1 30 4 =8$ 9 
40 6 20 2 4 16 Zs 
60 7 10 1 6 36 
20 3°65 30 4 - 5 "25, 
80 8 60 Biss 0 0 


Total _— — = — 81°50 

> PES UU ne aeecccmeecee i 
For equal ranks some adjustment in the above formula is required, — 
te, to add (m*—m) with Sd* where m=number of items whose 

- ranks are common. ; 


Here, y=1— S124? + ru(m* =m) + Ja(m* — m)} 
b/ lien Saeed 


_ The item 20 is repeated 2 times in X-series, i.2., m=2 in X- — 
series and again m=3 in Y-series, 


= 1 - S181'5 + 2(2- 9) +4(3°~ 3) 
vat 83-8 


_G18L5 +5 +9} 1 — 8.84 _, _ 504 
BOR a Z 


os 504 +> 504 


=1-1=0. 
There is.no correlation, 


Note. Some statisticians prefer the previous formula without adjustment. 
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Bivariate Data. 


Previously the methods of summarisation of data having 
variation of one character have been discussed. Very often 
data may relate to variation in two or more characters. At 
present, let us take two variables, represented by # and y. Thus x 
may be the height and y weight of a person. The observations of each 
person (i.e., individual) are paired. Thus for » observations we will 
find (v1, yi), (ta Ya) °°" (ams Yn). Statistical data relating to simul- 
taneous measurement of two variables are called Bivariate Data. 
Again data, relating to one variable only are called Univariate Data. 


Example. 
Bivariate data of height (x) in ems. and weight (y) in kgs. of 
15 persons : 


ew: 174.,.170\ 178), 176.180.4178 689,174 180 } 170 


When the number of pairs of observations is large, then it is 
necessary to form a two-way frequency table, usually known as Biva- 
riate Frequency Table or Correlation Table.* The method of framing 
such table is similar to (univariate) frequency distributions table 
(discussed earlier), For the n pairs of observations (¢3, Ux)» (@2, U2), 
=, (tn, Yn) in relation to two variables # and y (taken above), the class- 
intervals of both the series are chosen first based on extreme values 
of # and y series. If there are m class-intervals for « series and 7 class- 
intervals for y series, a table with m rows and m columns is to be 
constructed. So in the table, there willbe mxn rectangular spaces, 
known as cells. The class-intervals of a and y series are taken as row- 
headings and column-headings respectively. 

Procedure of framing bivariate frequency distribution of the data 
given above. " . ~ 

There are many different numbers both for x and y variates. In 
this case it is better to make some suitable class-intervals for x and y 
series, In this case for x-series, class-intervals 161—165, 166-170, 
+++, efc. and those for y-series, class-intervals 56—59, 60—63, ---, ete. 
are taken. 


* A correlation table is also known as Bivariate Frequency Table, since it shows 
the frequency distribution of two related variables. 
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Now the first number of « series (174) lies in the class (171—175) — 
and first number of y series (62) lies in the class (60—63). Now inthe 
cell corresponding to (171—175) and (60—63) (i.e, 2nd cell of 8rd row), 
one tally mark is given. Again for the second numbers of @ and y 
series (i.c., 170 and 59) lie in the classes (166—170) and (56—59) respec- 
tively. Soin their corresponding cell (i.¢., the 1st cell of the 2nd row) 
one tally mark is given. Similarly for the rest numbers tally marks 
are given. Now the markings are to be counted and to be written in 
the column and row totals. 


The bivariate frequency distribution of the above examples is 
shown below : 
Bivariate Frequency Distribution 
Weight (kg.) 


161—165 


166—170 


171—175 


height (cm.) 


Here the number of observations lying in a cell is known as cell 
Frequency. 

Note, Here in « variates, there are 5 class-intervals and in ¥ variates 4 class- 
intervals. So total number of cells is 5x 4=90, 


© Cinss-intervals of @ amd y series may also bo placed in column-headings 
and row-headings respectively, . 
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Marginal Distribution and Conditional Distribution. 


A univariate distribution (say « variable) obtained from. the 
bivariate, distribution, irrespective of values of other variable 
(y variable) is called a marginal distribution. The column totals 
of frequencies show the number of individuals belonging to x variate. 
This shows the frequency distribution of 2, known as marginal 
distribution of « in the present context. 


Example. 
Marginal Distribution Marginal Distribution 
of Height of Weight 
height frequency weight frequency 
(cm.) (kg.) 
161—165 1 56—59 3 
166—170 3 60—63 5 
171—175 6 64—67 5 
176—180 4 68—71 2 
181—185 1 
Total 15 * Total 15 


Again » univariate distribution obtained from a bivariate distri- 
bution for a particular value of the other variate is known as conditional 
distribution. Thus we find only one conditional distribution of # 
variate corresponding to each class-interval of y variate. Similarly, 
there is only one conditional distribution of y corresponding to each 
class-interval of # variate. 


Example. 
Conditional Distribution Conditional Distribution 
of Height of Weight 
(when weight 64—67 kg.) ‘ (when height 166—170 cm.) 
height frequency weight frequency 
(cm.) (kg.) 
161—165 0 56—59 2 
166—170__ 0. 60—63 1 
171—175 2 64—67 0 
176—180 3 68—71 0 
181—185 0 
Total 16 Total 8 
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Conditional Mean Value 


The arithmetic mean of the Specified conditional distribution 
is called the conditional mean yalue. - 


mi 


Example. ; 


Find the conditional mean values of x for y=6, 10 aol the 
following bivariate frequency distribution : 


Calculation of Conditional Mean Values of x 


(a) when y=6 (6) when y=10. 
a| t-te a | f-|-f0 
Ba paryd 4 inky sonal seai Sab piss 6 
10}. 2-|, 20 ley yore Sigal 
Lt al GT alo Mba 6} Ayiido ys 
Total) 4 | 40 Total! 5° | 40° 


see Gist rns atas 
AM. @)= ng = 10. 


Example. 


The data given below relate to heights and weights of 20 persons. 
You are required to form a two 


sway frequ tabl ith class- 
intervals 62” to 64”, 64” to 66" and ie on, and 115 ee “195 tba. 5 125 
to 135 lbs, and so on. 


Sl. no.: 1 2 3 4 5 


6, 27 8 9 10 


Height: 70 65 65 64 69 63 65 70 71 62 


Weight : 170 135 136 137 148 194. 117 128 148 199 
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Slnon:) Thin 12 1 WS yd O16 se Dba bE 18 «19.» 20 

Height: 70 67 63 68 67 69 66 68 67 67 

Weight: 163 189 122 134 140 182 120 148 129 152 
(C0. A. May 1966 ] 


Re 115—125 ee 185—145 | 145—155 age 165—175 “ape 
6284 Ji] | 3 
e4—66 | || Ml 4 
66—68 ii 1/ / 5 
68—70 | +/) II 4 
70—72 Phe f / / yond 
Total | 4 5 6 3 1 1 | 20 


Note. Here class-interval 62—64 (for height) means 62 and less than 64, 
and similarly for all other class-intervals of height and weight (i.¢., continuous series). 


Here tally marks are used to indicate frequencies, 


Calculation of Correlation Coefficient from Grouped Data 
When the number of observations of X and Y variables is large, 
the data are classified into a correlation table. The formula used 
is as follows— 
n> fdadu= (dfd.\Efdy) 


0 Un Bfdn—(Sfde)*. In3 fay? = (Sfay)* 


Note. This formula is same as discussed above, while deviations are taken 
from assumed mean. The only difference is that here the deviations are also 
multiplied by the frequencies. 


Eaplanations of the Symbols Used in the following examples : 
(i) a, y indicate mid-values of X and Y series respectively. 


(i) dg tM and ay 


(iii) fx, f2 represent marginal frequencies of x and y distribu- 
_ tions respectively. 
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"pt 


266 


(6-)x(@-)x8=(9) . 


SSI—Set | SsI—ett 
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(iy) number within brackets in every cell is the product of 
that cell frequency and the corresponding values of dz and 
dy. 

(v) fdedy is the total of the numbers within brackets men- 
tioned in (iv), in any row or column. 

So the above formula may be written as follows (for convenience) : 
ye $ nEfdedy —(3f1de\Sfady) 
Jn Bfide® —(Sfide)®. \nBfady?—(Sfady)® 


Example. 


Calculation of coefficient of correlation of the above example 
is shown in the previous page (p. 266). 
nEfdady — (SfidaSfedy) ss: 

JnBfida* —(Sfrde)*. /n3fady* —(Sfady)* 
20.20—2.(- 5) a 400 +10 
~/20.86 - 22. /20.37—(= 5)? » /720-4. /'740+95 

410 
~ /716. J765 
or, log y= log 410 — i(log 716 + log 765) 
=9°6128 — $(2°8549 — 2°8837) 
=9'6128 — 3(5°7386) 
= 2°61 28 — 2°8693 
= —0'2565 
=1°7435 
y =antilog 1°'7435 = "554. 


Now, y= 


EXERCISE 9 


1. (a) Define simple correlation, What is the difference between 
& positive correlation and negative correlation ? 


[ L.O0.W.A. June ’75, June '77, C.A. Nov. ’75 ] 


(b) Define correlation coefficient and state its important 
properties (clearly explain all the symbols you use). 

[ C.U. B.Com. (Hons.) 1980 ] 

2. Mention few methods of studying correlation. What is 

Scatter Diagram. Indicate by means of suitable scatter diagram 

different types of correlation that may exist between the variables in 

bivariate data. [LC.W.A. June ’74, Tune ’76 ] 
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3. Write notes on : 
(i) Scatter Diagram. 
(ii) Pearsonian Coefficient of Correlation. 
(iii) Rank Correlation Coefficient. 


4. Following are the heights and weights of 10 students of a 
B.Com. class : 
Height (inch); 62 °-72 68 58 65 70-66 63 60 72 


Weight (kg.) : 50 65 63 50 54 60 61 55, 54 65 


Draw a scatter diagram and indicate whether the correlation is 
positive or negative. 


5. Construct a scatter diagram of the data given below and fit a 
‘straight line by free-hand method : ¢ 


(Average value in lakh Rs.) 


Years: 1965 1966 1967 1968 1969 1970 1971 1972 


Export : 47 64 100 97 196° °°908 Ply 115 


Import: 70 85 100 103 111° 139 183. «115 
(of a commodity). . 
eee aS Se eee 
6. Calculate Pearson’s Coefficient of Oorrelation between 
Advertisement cost and sales as per the data given below : 
Advertisement : 


cost in ’000 Rs.: 39 65 62 90 82 15.25 98 36 78 


sales in lakh Rs.: 47 53 §8 86°62 68 60.91 51 84 
[C.A. Nov. 75] — ( Ans. 10°78 ) 


7. Making use of the data given below, calculate the coefficient 
of correlation : 


Case =X Wav sOase x Y 
i 10 9 5 12 ll 
i Q 6 4 6 13 13 
3 9 6 Tl} fio cine ches 
412.0 aapendan He. ig Cee av 


ie (Ans. 4 0°896 ) 
») 8. Nine students obtained the following percentage of marks in‘ 
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College Test (X) and in Final University Examination (Y). Calculate 
the correlation coefficient. 

eer aot 63 73 46 50 60 47 36 60 


Nay 49 72 74 44 58 66 50 30 35 
[I.C.W.A. 1967 ] ( Ans. +0°982 } 


9. Calculate, Pearson’s Coefficient of Correlation from the 
following, taking 100 and 50 as the assumed average of X and Y 
respectively : 


X: 104 111 104 114 118 117 105 108 106 100 104 105 


Y: 67 55 47 45 45 60 64 68 -66 62 69 61 
[C.A. Nov. 1976] (Ans. —0°67) 


10, Find the correlation coefficient between the income and 
expenditure of a wage-earner and comment : 


Month : Jan. Feb. March April May June July 
Income: 46 54 56 56 58 60 62 
Expenditure: 36 40 44 54 42 58 54 


(Ans. +0°769 ) 


11. Calculate the coefficient of correlation for the ages of husband 
and wife : 
Age of husband: 23 27 28 29 30 31 33 385 386 39 


Age of wife +18 22 23 24 25 26 28 29 30 32 
{I.0.W.A. 1970 ] ( Ans. +0°996 ): 


12. Marks in Mathematics and Statistics of 10 students are given 
below : 
Math.: 82 88 48 43 40 22 41 69 35 64 


Stat..: 30. 31, 38 43 33 11,27 .76 40 59 
—Find the coefficient of correlation. 
[LO.W.A. June'74] ( Ans. +0°985 ) 


13. ‘The table below gives the respective heights X and Y of two 
samples of 10 plants each grown under two different conditions : 
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Plant Sample I Sample II 
No. X (cm.) Y¥ (em.) 
1 30 45 
2 50 63 
3 42 55 
4 25 48 
5 60 65 
6 28 48 
7 32 50 
8 55 . 60 
9 58 60 
10 35 49 
—Find the value of », (Ans. +09) 


14. Calculate rank correlation coefficient for a group of 10 
students in the following case : 


Marks in History ©: 70 65 68 60 58 55 54 53 50 45 
Marks in Geography: 90 80 76 75 70 50 48 45 42 40 


(Ans. 1) 


15. Following are the ranks obtained by 10 students in two 
subjects—Statistics and Mathematics, To what extent the knowledge 
of students in the two subjects is related ? 


Statistics §: 1°98 45 6 7 g 9 109 


Mathematics: 2 4 1 6 Sooo ee 10 6 6 
( Ans. +0°76 ) 


16. Calculate coefficient of rank correlation from the following 
data : : 


v: 48 83 40 9 16 16 65 24 16 57 
foe een ene te OIL, 
y: 13 18 24 6 465 4 20 9 9 #19 


(Ans. +0°738 ) 


17. From the following data caleulate the coefficient of rank 
correlation between @ and y : j 


@: 86 56 20° 65 42 33 44 50 15 60 


y: 60> 35. 70 95 58 75.60 45 80 38 
= (Ans. —0'997 ) 


CORRELATION AND REGRESSION Q71 


18. In a contest, two judges ranked eight candidates A, B, O, D, 
H, F, Gand H in order of their performance, as shown in the 
following table. Find the rank correlation coefficient : 


e= Siok =a MeOPnba O2e aed hangman © smn sb 


First judge : 5 2 8 1 4 6 3 7 
Second judge: 4 5 7 3 2 8 1 6 


[L.C.W.A. June 75] (Ans. +4) 


19. Find the value of coefficient of correlation from the following 
values : 


11 uu a uu vi ue 
(a) 2 ~X)?=110; 2% — Y)* =990; 2 (X;- X)(¥s- Y)=830 
= 12 t= 
(Ans. 1) 
100 100 109 100 
(t) 5 X:=280; 5 Y¥;=60; ¥'X,2=2,384; 5 Y¥,2=117; 
51 i=0 i=1 i=1 


100 
» XY; = 438. 
t=1 


[ LO.W.A. July 71] ( Ans. 0°75 ) 


20. Marks obtained in Statistics and Auditing by 24 students 
are given below. Prepare a bivariate frequency distribution table. 


Marks | Marks Marks | Marks Marks | Marks 
Sl. i 3 Sl. : t Sl. x 
No am Ue Nos an an, Noi mn a 
“| Stat. Audit. Stat. | Audit, Stat. Audit, 
nt 22 16 17 a7 15 
2 23 18 18 27 16 
3 24 18 19 26 18 
4 24 17 20 28 19 
5 23 16 a1 25 19 
6 25 17 22 24 16 
q 93 17 23 23 17 
8 22 17 24 25 19) 


Hints: The marks in Statistics assume only 7 values from 22—28 
and those of Auditing assume only 5 values from 15—19. Against these 
values tally marks are to be plotted to form the table, similar to 
discrete series table. ( Ans. freq. : 8, 9,4, 4, 1, 2,1 

freq. : 1, 9, 6, 4, 4) 
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91. The ages of 20 husbands and wives are given below. Form 

a two-way frequency table showing the relationship between the ages 
of husbands and wives with the class-intervals 20—25, 25—30, etc. 

{C. A. May, 1970 ] 


81. No. eo 1 D 88) 656/778 980) 1112-13 14 15 
Age of husband : 28 87 42 95 29 47 87 85 23 41 27 89 293 33 36 
Age of wife : 23 80 40 26 25°41 35 25 21 38 24 34 20 31 29 


16 17 18 19 20 
32 22 29 88 48 
85 23 27. 34.47 
(Ans. freq. : 5, 5,4, 3/9,1; freq. : 3, 5, 2, 6, 2,2) 
92. From the following table obtain the conditional mean values 
of ¥ for given values of =O, 1, 2, 8: 


(Ans. “8, 2°5, 2, 15°) 

23. Calculate the coefficient of correlation between the marks 
obtained by a batch of 100 students in Accountancy and Statistics 
ag given in the following table : 


Marks in Accountancy 


Marks in 

Statistics | 99 39 30-40 40-50 60-60 60—70| Total 
1595 | 5 9 3 17 
25—35 10 95 D 37 
35—45 1 12 2 15 
45—55 ae \ one) G65 25 
55—~65 Seance 6 


Total 5 20 44 24 7 100 


( Ans, 0°795 ) 
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24. The following are the marks obtained by the students of a 
class in Statistics and Accountancy : 


Marks in Marks in Marks in Marks in 


Shp Statistics Account, | S* No- Statistics Accownt. 
1 15 13 13 14 11 
2 0 1 14 9 3 
3 1 2 15 8 5 
4 3 7 16 13 4 
5 16 8 17 10 10 
6 2 9 18 13 11 
7 11 14 
8 11 7 
9 12 18 

10 8 16 
11 9 15 
¥/ 3 


me 
p 


Prepare « correlation table taking the magnitude of each class- 
interval as 4 marks and the first class interval as equal to 0 and 
less than 4. Calculate Karl Pearson’s coefficient of correlation between 
the marks in Statistics and Accountancy. (Ans. 0°578 ) 


25. The following table gives the frequency according to age- 
groups of marks obtained by 67 students in an intelligence test. 
Measure the degree of relationship between age and general knowledge. 


Age in Years. 


Test marks} 18 19 20 21 | Total 


200—250 | 4 4 2 1 il 
250—800 | 3 5 4 2 14 
800—350 | 2 6 8 6 21 
350—400 | 1 4 6 10 21 


Total |10 19 20 18 | 67 (Ans, 0°415) 


26. Family income and its percentege spent on food in the 
case of hundred families gave the following bivariate frequency 
distribution, Calculate the coefficient of correlation, 
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Family Income in Rupees 


; aoghanaia’ 200-800 300—400 400500 500—600 600—700 4 
10—15% = = - 3 FC 
15—20% - 4 9 4 3 
20-25% 7 6 1 5 - 
25—30% 3 10 19 8 mL 


(Ans. 0.488) © 

27. The following table gives a bivariate frequency distribution — 

of 50 clerks according to age in years and pay in rupees; find the ; 
yalue of the correlation coeflicient between the variables. 


Pay 


Age | 250300» 300—850 + 350-400... 400—450 | Total 


20-80] 8 8 ~ 
30-40]... 2 5 

4050.) — |. 2 

50-60 | — 


(B) Regression. 


Introduction: In the previous chapter, we have established — 
the close relation between two variables. Now we are interested to _ 
estimate (predict) the value of one variable for the given value of — 
the other. For example,the heights and weights are correlated, — 
now we may estimate a height for a given weight. righ 
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The term ‘regression’ means, going back or study, according to 
dictionary. F. Galton’s study regarding the height of fathers and~ 
their sons revealed an interesting relationship. The deviations of 
mean heights of the sons from the mean height of the race were less 
than the deviations in the mean height of the fathers from mean 
height of the race. When the fathers were above or below the mean, 
the sons tended to go back or regress towards the mean.’ Thus 
regression implies going back or returning. Galton represented: the 
average relationship between these variables graphically and calléd_ 
the line thus obtained as the line of regression. Regression, lines ~ 
give idea on the correlation of two series. If the coefficient or correla- 
tion between the heights of fathers and their sons ig +°6, is means 
of a group of fathers had an average of 2 cms. above general average, 
the average’ height of their ‘sons would be only +°6r oms: above the 
general average. This going back towards the average is’ called 
regression. 


The regression analysis helps in following ways :— 


(1) 'o estimate (or predict) the values of dependent variables 
from values of independent variable. 


(2) To obtain the measure of error involved in using the 
regression line as a basis of estimation. 


(3) To obtain a measure of association or correlation that exists 
between the two variables. : 


At present we shall do problems of two variables (i.¢c., simple 
regression), although the analysis may be extended to three or more 
variables... -— 


Difference between Correlation and Regression. 


Correlation coefficient is a measure of degree of relationship » 
between X and Y, whereas the regression analysis reveals the study 
~ of nature of relationship between the variables. x . 


Regression Equations, 


For two variables 2 and y, we shall have two regression lines. 
One regression of # on y and the other of y on @. Regression equa- 
tions are algebraic expression of regression lines. Now for two 
regression lines there will be two regression equations. It may be 
noted that the regression of @ on y is used to describe the variation 
in the values of # for given changes in y and the regression equation 
of y on @ is used to describe the variation in the values of y for 
given changes in 2. ; , 
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REGRESSION EQUATION oF X ON Y: 
F This equation is as follows, 
w=artby. 
Now to determine the constants a and 6 we are to solve the 
following normal equations : 


sa=na + bSy, m=number of observed pair of values 3 
| Sey =asyt+bsy3, 


REGRESSION EQUATION OF Y ON X: , 
The equation is y=a+bz, where the value of a and b are to be 
“obtained by solving : i d 
Zy=natbda 


Day =adx+b>27, 
Example, 


From the following data obtain the two regression equations : 


226 2 10 4 8 
CAMO Ween Bb 5 8 7 


(I. OC. W. A. Jan. 1967 } 


Computation of Regression Equations 


2 y xy gt y? 
2 11 92 4 121 
10 5 50. 100 25 
4 8 32 16 64 
8 7 56 64 49 


e780, su 0, | Saye 914... go ea90 By? =340 


For equation of 2 on yi @=a+ by and the normal equations are 
2e=natbysy and sey= asy + bsy?, 
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Now putting values, 30=5a+40 p74) 
214 = 40a + 3405 we (ii) 
Multiplying (i) by 8 and subtracting from (ii) we get, 
—26=20b or, b=-13. 
Now putting this value of d in (i) we get 
30=5a+40(—1°38) or, a=16'4. 
-". the regression line of z on y is ¢=16'4—-1'3y. 
Now for the other regression equation of y on a, we get 
y=atbe 
and the normal equationsas Sy=na+bda 
1° Sey =adat bse’. 

Putting values, 40=5a+30d " es (i) 

and 214 = 30a + 220d ues (ii) 
Multiplying (i) by 6 and subtracting it from (ii) we get 

—26=40b or, b= —-0'65, 
Putting this value of } in (i), we have 
40 =5a + 30(— °65) 
or, a=119, 
Hence,the regression line of y on x is 
y=11'9-0'65a. 


To find the Regression Equation of y on x. 


The regression equation of y on x can be represented in the 
form y=a+bz, where a and } are constants, and determine the position 
of the regression line completely. 

Let (a3, 1), (aa, ya), ***, (an, Ya) be a set of % observations of 
the two variates a and y. Through the set of m observations a 
straight line of the form 

y=at be sen (1) 
can be fitted. In this case, # is independent and y is dependent. 


Now, to find the values of a and b, we are to apply the method 
of least square and solve the following normal equations : 
Sy =natb3de sre (2) 
Say =ade+bsz* (3) 
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In eqn. (2), dividing both sides by n, we get 
zy 


a Se ee pe (4 
math or, y=arbe. (4) 


Now, subtracting (4) from (1), 
y-y=Ye-z) 6) 
Again multiplying (2) by Sa and (3) by m, we find 
(32) Su) =na(Se)+(50)* 
n(Bay)=na( Sx) +n S22) 
Subtracting, (S2\(Sy) — (Sev) = ol(S2)? - nf s2*)] 
“or, b= nev Seley), (changing sign) --- (6) 


Boy _ Se Sy : 
£.- or, b= ae (dividing num, and deno, by n*) 
n \n ' 
cov (a, 4). 
“oat 


(0) 


From (5), we find y-7=ya(#—2), where bye = cor lea) 


(bye indicates y on 2). 
This is the required regression equation of y on a. 


Again we know Lissa in »), i.e, cov (a, ¥)=¥ cacy. 
aeOy 


Brom (8), yo PE ay saa @) 


Regression Equation of x on y. 


Proceeding as above, the regression equation of @ on y will 
be %-%=0(y—y), here b stands for bey 
Say _ Sx Sy 
where bay= 2 seis (9) 
zu. (=) 
n 


a 00v (a 9) ax, 
{ oy" x oy 


(10) 
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Example. 


Find both the regression equations by the method mentioned 
in the previous example. : 
First, for regression equation of y on 2, we have y—y = byz (a-z), 
Sry _ Se Sy 
_ cov (ay) mm 
where yz at Se? | =" 
n n 


a1 _ 30,40 : 
se 5 5 5 _42'8-6x8 
100 _(Oy A=" 
: Bh eNG 
Et el + ee 
eae Go 


Again p-2t= 2-8 ; 


.. the regression equation of y ong is y—-8= —0°65 (a6) 
or, y-S8=-0°65c+8'90 ic, y+0'65e= 11°9 
(the same result as before ) 
Next for regression equation of « on y, we get ¢—%= buy (y- y) 
Sey _Do,Sy 214 _ 80,40 


te ae at 5 * 5 _49'8-6x8 
a Sy ()" pay 68-8 
n n i) 5 


498-48 -52_ 


weety eat 
*. the regression equation of # on y is w—6= — 18(y-8) 
or, e+1°8y=16'4 (the same result as before ) 


Properties of Linear Regression Equations. 


1, The linear regression equation of y on @ is Y—-Y = bya (w— 2) 
and that of # on yis a—@=bzy (y—y) where bye and bey are known 
ag regression coefficients of y on & and @ on y respectively. 


bya = cov fo) eid 


on, 


cov (a, ¥) 
aS ss 
uv oy 


G 
Yond bey = 
Oz 
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2. The product of two regression coefficients is equal to thé 

square of the correlation coefficient, 
fe, bye X Dey =y TEx y Em y?, 
On Oy 

8. Regression coefficients and correlation coefficient, i.¢., byx, bey 
and » have the same sign, i.e, if both the regression coefficients 
have a negative sign, y will also be negative, and again if the regression 
coefficients are both positive, then y will be positive. If » is zero, 
then byz and bey will be zero (it is clear from Prop. 2). 


4, Two regression lines always intersect at (a, y). The slope 
of regression line of y on # is byx and that of 2 on y is aey 
ay 
5. Two regression equations may be written as weve, ere 
Vv x“ 


and Aes) Y—¥ which are different. But if y= +1, the two equa- 


v 
tions become identical. Again if y=0, then we find y=y and x=Z. 
In that case y or «cannot be estimated from linear regression 
equations. 


Show that the Correlation Coefficient is the Geometric Mean of Regression 
Coefficients. 


We know byx ee besr(d) 
Ch Ox Ox 
(where 7 = correlation coefficient) 
cov (x, ¥ 
and dey Ow 8s 50) 
Now bye X bay = vt x a =»? (multiplying (1) and (2) ] 
or, ¥= Jbye%bzy (@ linear relation between correlation 


and regression coefficients) 


Note. 1, We know y<1, ie, 1 1, so also the product of regression 
coefficients cannot be greater than 1, 
2. One of the regression coefficients must be Jess than or equal to 1. 
8, Since is G. M. of two Tegression coefficients, the value of ¥ will lie 
between the values of two regression coefficients. 


Show that regression coefficients are ind 
Topohdent ca vote lependent of change of origin but 


eet See [see formula (6) ] 
da aly- 
% se [by (7)] 


We know by: = 
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J@-A Tad, Tosa 3d, 
Let see te and. v i 
Then c=A+hu and y=B+kv 
or, w=Athu and y=Bth. 
Subtracting e-z=h(u-—u%) and y-y=Kv—). 
Putting these values in the above formula we get 
b _zhku-a(o- 9) _ hk S(u- av -) 
ve Shu uh see u)* 
ak, slu-ulv-9) _ & be 
“ho su-u® no 


Similarly we have, bz, = * bie Hence the result. 


Deviations taken from actual A. M. of x and y. 

We know the regression. equation of y on 2 is 

y=a+ be, here b(=byx) is the regression coefficient of y on 2, 

or yratbz. 

Subtracting, y-y=-ble-z) which is the required regression 
equation of y on &. 

Agrin Y= Dye) EY Ey (7) 

ee -2\(v-y) 
ca a)? 
a * en where X=e¢-@; Y=y-y. 


Similarly  Ody= a 7 


Regression equation of « on y is 
2 = Dry (y- 7). 


Note. This formula is applicable when the actual mean is not in fraction. 


i, Example. 


Otain the equations of the two lines of regression for the data 
given below : 


xed 2 3 4 5 6 7 8 9 


y: 9 8 10 #12 il 13 #14 #16 «15 
{1.0. W. A. Dec. 1978 ] 
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Calculation of Regression Lines 


x x y ¥ x ry? XY 
2-2 u-y 
1 —4 ae) -3 16 9 12 
p} —8 8 -4 9 16 12 
8 -3 10 -2 4 4 4 
4 -1 12 0 1 0 0 
5 0 it =1 0 1 0 
6 1 18 i 1 1 1 
a p} 14 p} 4 4 4 
8 3 16 4 9 16 12 
9 4 15 3 16. 9 12 
0 0 


Regression coefficient of y on 2, byz ~ Meee) = at 
Otiee 
= 607 0°95. 
Regression line: y-— y= by: (7-2) . 
or y—12=0°95(2—5) 
or y= 0°95a +.7'25, 
Again for regression coefficient of x ony, 
— Sle= aly-7) _ SRY _ 57 _ 
bay sy-y)* SY? 76070 95 
Regression line: 2-%= bry (y—-7) 
or, #-5=0'95(y-19) 
or, a=0'95y -6°4. 
Example. 


Tests scores: 14 19 4 21 96 92 415 290 19 
Sales (00 Rs): 31 36 48 37 50 45 33 41 39 
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Calculate the coefficient of correlation between the test scores and 
the sales. Does it indicate that the termination of services of low 
cost scores is justified ? If the firm wants a minimum sales volume 
of Rs, 3,000, what is the minimum test score that will ensure conti- 
nuation of service ? [0. A, Inter Nov. 74] 


Let x denotes the test scores and y denotes the sales (00 Rs.). 


Calculation of Regression Equations 


2 x xe y ¥: be XY 
(a - =) (y-y) 

14 Hi} 36 31 29 81 54 
29, Geleie i 1 36 -4 16 4 
24 4 16 48 8 64 32 
21 1 I; 37 -3 9 -3 
26 6 36 50 10 100 60 
22 2 4 45 5 25 10 
15 coh 3} 25 33 =f 49 85 
20 0 0 41 x 1 0 
19 mi 1 39 a | 1 1 


oss =v. Sony ig 


ae san fa eit zt. Jas 65 
ee 


= S(a- ayy) SKY | 1198 9g 477 
Ney Nogoy “9x3'65 x62 ‘> 


We find the coefficient of correlation is high enough to justify 
the proposal. 


Now the regression of Test Scores («) on sales (y) is given by 


-a=buly-y), 
where bay = = ='9477x as 


= "5578 (calculation by log-table) 
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or, @—20="5578 (y— 40) 
or, @—20="5578y — 29°3120 
or, 2='5578y-2°312 : 
For y=Rs. 3000, i.¢., 30('00 Rs.) we have 

w="5578 x 30 — 2°312 

= 16°734 —2'312= 14499 = 14, 

A minimum test score 14 will ensure the continuation of 
service, ; 


Example. 


From the following results, obtain the two regression equations 
and estimate the yield of crops when the rainfall is 29 cms. and the 
rainfall when the yield is 600 kg. 


Ys »:¢ 
(yield } (rainfall 
in kg.) | in cm.) 


Mean 508"4 26°7 
8.D. 86'8 4°6 


Coefficient of correlation between yield and rainfall =0'52. 
[ C.A. Inter. May 1976 ] 
Regression equation of y on 2 is 
¥~ Y= byx (w-2), where, 
bye 75" = 059 x 68 446 
or, y—608'4=4'16 (2 —96'7) 
or, y—508'4=4'162—-111'072 
or, y = 4162 + 397°328 
For ©=29, we haye y=4'16 x 29+397'398 
=120'64 +397°33 =517°97 ke. 
Regression equation of « on y, 
2-Z=Dry (y- 7) 
where bey = 72 = 0°59 x £© = 065 
oy 36'8 
or, 2—26°7="065(y —508"4) 
or, 2£-—26'7="065y — 33°05 
or, w= "065y —6'35 
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For  y=600, we find, 
a="065 x 600 — 6°35 
= 39-6'35 = 32°65 cms. 


Example. . 
For some bivariate data, the following results were obtained : 
The mean value of K=53'2, 
the mean value of Y=27'9, 


the regression coefficient of Y on K = -1'5, 
and the regression coefficient of X on Y= —0'2, 


Find (i) the most probable value of Y, when X=60; 
(ii) the coefficient of correlation between X and Y. 
Regression equation of ¥ on X is Y—Y=byz (X—X) 
or, Y—-27'9=—-1°5(X— 53°92) 


or, Y=-1'5X+15 x 63'2+279 
or, Y=-1'5X+7980+27'9. 
For X=60, we get Y= —15x60+79°80+27'9 
= -—90+1077=17'7. 


Again y= + VbyaXbey= + J-15x — 2 = — /'8= — “647. 
As both the regression coefficients have negative sign, so also 
y will have the same sign. 


Example. 


The following figures relate to years of service and income 
in hundreds of rupees of the employee of an organisation. Find 
the initial start for a person applying for a job after having served 
in another factory for a period of 12 years in a similar capacity. 


Length of service (yrs.): 11 7 9 5 8 6210 
Income (hundreds of Ba): 7 5: 8 2 6 4 8 


Let # represents length of service and y represents income. Here, 
we are to find initial income after serving 12 yrs. in similar capacity, 
4.é., to find regression equation of y on x. 
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Calculation for Regression Equation of y on x 


Year e-@ Income (Rs.) y-y ie 
2 @=8) ('00) G5) 
Kay ox y Vinay XY 
ll 3 9 7 2 4 YO a 
7 ¢ Seige) 9 5 0-0 ) 
9 fee’ 1 3 =o oT4 2 
5 i as} 9 2 -0 9 9 
Baws 3 0 6 1 1 0 ie 
6 =2 4 4 saa | 1 2.8 
10 2 4 8) ot : i) 6 
a= 56 XK? =28 Sy 35 ¥i=98 SXY-21 5 
-22 3 =8; Sy 35 5 ) 
Naini ; 
Since we are : find regression equation of ve ‘on , we shall use 
b= = DXY, 
ya vad 
bys = a =0°75. 


Again to find initial income for the person of 12 years’ experience 
in the same capacity, substituting the respective yalues we find, 


y-5=0'75(«- 8) ; 
or, y-5= "752-6 eg 
or, y="Tie@-1. 
Now, for #=12, y="75x 12-1=9-1=8., 
.. vreqd. initial start =Rs. 800. a 


med Hite, wot 


EXERCISE 10 


1. Define ‘regression’ ; why are there two regression lines ? 


2. Distinguish clearly between correlation and regression 28 
concepts used in statistical analysis. 


3, The heights (cm.) of a group of iatbers prdlcons: are given 
below : 


Ht. of fathers: 158 160 163 165 167 170 167 172 177 181 
He. of sons : 163 158 167 170 160 180 170 175 172 175 — 
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Find the lines of regression and estimate the height of son 


when the height of the father is 164 cms. 
(Ans. y="692+566, 2=°704y +4902, ht. of son=166'3 cms.) 


4. Find the regression equation from the following data : 
Age of husband ;, 18 19.20). 21 92 28 24 95 26 27 
Ago of wife : 17 17 18 18 18 19 19 20 21 22° 
(Ans, @=1°747y = 10152 5.9 ="527a + 7'04) 


5. From the following table, phowing age of cars of a certain 
make and annual maintenance costs, obtain the regression equation 
for costs related to age. ~ 


Age of cars (yrs.) : v} 4 6 7 Se AO 19 
Annual maintenance: 1600 1500 1800 1900 1700 2100 2000 
cost (Rs) (Ans. y= 52'86a + 1429'98) 


6. From the following data, obtain the two regression equa- 
tions : 


Sales : 91 97 108 191 67 124 51 78 111 57 


Purchases; 71.75 69 97 70 91 39.61 80 47 


Hence or otherwise find = ier Seeplioat between sales and 
purchases, i 


[ C.A. May '77 J (Ans. San O'G61X +151; x= =136¥- 6°25 7=+'91,) 


7. From the following data, find the regression equation which 
you think to be fit’: 


Age: 66040" 72° 86. 63 47%) 65 49 88 


Blood pressure: 147 125 160 118 149 128 150 145 115 
(Ans. age (a) blood pr. (y) (say) y= 1°27" + 79°86) 
8. The height (inch) and quantity of dry bark (oz.) of 8 sinkano 
trees are as follows ; 3 
Height (2): 8 11 7 10 12 5 4 6 
Quantity (v): 19 80 95 44 88 25 20 97 
Find the regression equation of y on w. If the height is 15 inches, 
find the quantity of dry bark. (Ans. y=2'57a + 8°98 ; 46°83 oz.) 


9. Pind the two regression equations from the following data, 
If the age of wife is 19 years, find that of husband, and again if the 
age of husband is 30 years, find that of wife. 
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Age of husband: 25 22 28 26 35 20 22 40 20 18 


Age of wife : 18 15 20 17 22 14 16 21 15 14 ; 
(Ans, 2=2'23y—12°76, y="385a+ 7°34, — 
husband’s age (a), wife’s age (y) ; 29°6 ; 18°9) © 


10. A sample of size n=16, yield the following sums: 
Sa=749, Sy=77°90, Sy* = 454'81, Sry =3156'80 
and >a*=49°177, 
Compute the linear regression equation of « on y. i 
(0. U. B, Com, (Hons.) 1980] (Ans. «= 41°22y — 195°38) © 


11. Ina correlation study, the following values are obtained : 


x Y 

Mean 65 67 

8. D. 95 35 
Coefficient of correlation 08 


Find the two regression equations those are associated with the 
above values. y 
(Ans. X="67Y +2681; Y=1'19X -5'8) © 


12. You are given the data relating to purchases and sales. — 
Obtain the regression equations by the method of least squares and 
estimate the likely sales when the purchases equal to 100. 


Purchases: 62 72 98 76 81 56 76 92 88 49 


STS SIGS 22 RS SSP TS eee ig 
Sales ; 112 124 131 117 182 96 120 1386 97 85 
[C. A. Inter, May ’75 ] (dns. c=2'8+68y ; y=56'5+ 78a ; 134'5) 


13. ‘You are given the following results for the heights (X) and 

weights (Y) of 1000 business executives : } 

X=68 inches _ ox =2'5 inches 

Y=150 lbs. oy = 20 lbs. 
~ Estimate from the above data : ; 

(i) The weight of a particular executive, who is 5 feet tall. 
(ii) The height of a particular executive, whose weight is 
oe 900 Ibs. 
(4s. (i) 111°6 Ibs. ; (ii) 71°75 inches) — 


14, Find oy, given that, variance of X= 36, bay =0°8, y=0'5. 
(Ans. 3°75) 
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INDEX NUMBERS 


Introduction. 


Index numbers are devices which indicate by its variations the 
changes in the magnitude in a group of related variables. The group 
of related variables may be prices of a specified set of commodities or 
the volume of production in different sectors or such other concept as 
intelligence, health or efficiency. They may measure variations over 
time, over space or between similar categories such as institutions, 
objects, etc. | : 

The commonest type of such index measures is one known as 
Price Index which measures variation in prices over a span of time. It 
enables us to know how the price level of a group of commodities has 
changed at certain periods of time a8 compared with another period, 
called base period. When the price of one commodity rises while the 
price of another falls and the prices of various commodities all react in 
different degrees, the index number shall not give here any indication 
of changes in the values of the individual commodity but will reveal the 
average net effect of all the changes. Retail Price Index, Wholesale 
Price Index, Index of Wage-rates (wages being the price of labour) 
are some of price indices. 

Similarly, a quantity index enables us to know the average 
changes in the quantities of the items belonging to a group of commo- 
dities. An Index of industrial production or an Index of the volume of 
exports are some of the examples of quantity index. 

Index numbers enable comparison to be made between the levels 
of prices and wages, or between the levels of production and wages. 
Index number also measures the amount of change in the productivity 
within 9 firm, or in the value of trade, or the difference in level of 
intelligence among students of different institutions. 


Compilation of Index Numbers. 


The most and widely used common type of index is Price Index 
which measures variation in price level over two different periods of 
time, As such, the theory will be discussed here in terms of prices 
over two different periods. The principles involve will of course apply 
to any other type of index number. 


Bus. Stat.—19 
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Basic Principle of Construction 


(i) Prick RELATIVE METHOD 


In constructing the index number, the basic device, in general, is 
to calculate a set of relatives by expressing the prices of a given period — 
as percentage of prices of base period and then to average the relatives” 
so derived. 


(ii) Ae@REGATE MerHop 


i Tn this method, index number is calculated by expressing the 
aggregate of the prices of given period as a percentage of the aggregate } 
of the prices of base period. ‘ 


i _ Given period price 
Note 1. Price Relative (P.R.) Baseipecioniprice x 100, 


Note 2. Simple Index Number is an index number which measures change ina q 
single item. 


Problems in the Construction of Index Numbers. 


We are generally confronted with the following five major pro- 
blems while constructing any Index Number : i. 


(i) Definition of Purpose of the Index Number. 
(ii) Selection of Items and Collection of Data. 
(iii) Selection of Base Period, 

(iv) Selection of Weights. 

(vy) Choice of Average. 


(i) Definition of Purpose of the Index Number 


The first essential Point to be considered is a precise statement — 
of the purpose for which a particular index number is to be constructed. 
All index numbers will not serve the same purpose or there is no all- © 
purpose index number. The purpose of construction will help the selec- 
tion of items, the selection of base period, the selection of weights, otc. 
Thus, for compilation of consumers price index number wholesale prices © 
should not be included. To study the changes in general price level 
after independence the base year will be year immediately preceding 
the year of independence. For construction of index number of building 
materials item like cement is more important than the item glass and 
hence the item cement will receive high weightage than that of glass. 


As the purpose will determine the data to be collected, the data 


available or lack of it, may necessitate the modification of purpose. 
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(ii) Selection of Items and Collection of Data 


Items selected for the construction of index number should be 
relevant. 'Haphazard selection will not only be confusing but some- 
times useless. In constructing consumers price index for working-class 
items like T. V., Refrigerator, Motor-car should not be taken into 
consideration. 


Index number takes into account a large number of items. But 
it might not be practicable to include all the relevant items in view of 
cost and time. Hence, there is the necessity of sampling. Items 
selected should be representative of all the relevant items. 


Items included should neither be too large nor too few. If the 
items included be too large, the more will the index number be repre- 
sentative but it will create difficulty in the accurate collection of data, 
apart from the increased cost and time. Again, inclusion of too few 
items in the index number will make it unreliable, | 


Hence, items selected should be both adequate in number and 
representative in character. 


Sometimes, the prices of one and the same item becomes incom- 
parable between different dates due to considerable change in its 
quality. Hence, items of standard quality should be entered in the 
index number construction. 


Again, for same item the price quotations may vary from place to f 
place and for quality to quality and since all prices at all transactions 
and of all varieties cannot be taken into account, samples of prices of all 
varieties and at all transactions should be taken in such a manner that 
they will represent adequately the overall situation. ah 

As the tastes and habits change with the passage of time, some 
new items become important while some old items become obsolete and 
hence the selection of items for preparing the index number is to be 
very carefully done making necessary additions and alterations. 


The data collected should be basically accurate, wnbiased and 
dependable and may be collected from published sources or by appoint- 
ing some persons or institutions who can supply them as and when 
required. The accuracy of the data is to be checked properly before 
use, ; 


(iii) Selection of Base Period 


The base period must be a period of economic siability. The 
* prices in this period must not be fixed by law, this period must be one 
_in which no abnormal increase or decrease is found. 


The problem of choice of a base period having economic stability 
is very difficult. To solve this, aggregate or average of some periods 
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may be taken as base so that abnormalities. in:‘one direction will 
neutralize the abnormalities in another direction. 


Base period should neither be too long, nor too short. Generally, 
a year is taken as base. A week, or a month, or a group of years or 
months or weeks may also’ be taken as base. ; 


Base period should not be too distant from the given period as 
this may cause a change in the importance of some items. Further, 
the quality of the items may differ substantially if the interval is too 
wide and the prices at the two periods also become incomparable. The 
more and more the base period is away from the given period, the _ 
relative change will have larger and larger variability and hence shifting 
of the base period to some other period which is not too distant from 
the given period becomes necessary. 


(iv) Selection of Weights 


In constructing the index numbers all the items included are nob 
of equal importance in the sense that a similar change in the prices of 
different items does not affect the price level to the same extent. Therefore, — 
it is necessary that due consideration be given to the relative impor- 
tance of different items. This is done by allocating weights. Thus, 
we have two types of indices : (1) Simple or Unweighted index and (2) 
Weighted index. In simple aggregate index, all the items are consi- 
dered. equally important, hence weight assigned to each item is the 
same, Similarly in simple average of relatives the weight assigned to 
each relative is the same. So, in reality the simple index numbers are 
arbitrarily weighted index numbers. But as ib is reasonable that each 
item included should be allowed to have due importance on the index 
number, it. should be deliberately weighted. To construct a weighted 
index number, generally, price relatives are weighted by values, prices _ 
by quantities and quantities by prices. The prices and quantities used 
as weights may be the prices or quantities of (1) base period, (2) given 
period; (3) average or total of base and given periods, (4) average of all — 


the periods included in the inde 
periods thought to be typical, x number or (5) average of several 


(v) Choice of Average 


The choice of average will normally be dictated by practical 


consideration. In theory it is possible to use any form of 


average. Arithmetic mean is used in most of the important indices” 


due to its intelligibility and simplicity i i 

t d y in calculation. Some, however, 
ee Geometric mean for it is mathematically suitable and is less suscep- 
tible to a few items of very high and low values. Median and Mode 


are generally erratic and other averages are complicated. 
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Methods of Constructing Index Numbers. 
The following are the different methods of construction of Index 
Numbers : 
(1) MerHop or AGGREGATES 
(a) Simple aggregate of Prices. 
(b) Weighted aggregate of Prices. 


(2) MurHop oF RELATIVES 
(a) Simple average of Price Relatives. 
(b) Weighted average of Price Relatives, 
Symbols and Notations to be used throughout this chapter : 
Doc Do who acsetere represent prices in the base period denoted 
by the suffix’o: 
wis. represent quantities in the base period 
denoted by the suffix o. 


, “ mw 
do, Go» Jo s+ 


Dali Da aban tine represent prices in the given period denoted 
by suffix . 
n'y Qn ADR Es 3+ represent quantities in the given period 


4 denoted by the suffix». 

Dol + Po" + pol” + +++ = Spo =Total of base period prices. 

qo +q0 +40 to =Sqo =Total of base period quantities. 

Similarly Spn=pn' tpn" ton +o 
Ban = an’ + dn ¥ gn + ete 
BPGo=Podo' t+ Yodo! + Do"Go" + +" 
SOnda= ‘palgn' + pega Dal gall barons 

Io.n= Index Number with base period o and given period . 

We shall, however, use I in place of Iom when there is no 

possibility of confusion. 


(1-a) Simple Aggregate of Prices 


In this method, the Price Index will be the aggregate of prices of 
the given period expressed as a percentage of that in the base period. 


Syimbolically, Hohe x 100. os oh (i) 
‘Do ; 


EXAMPLE f 

From the data given below, calculate the index number of prices 
for all the years with reference to 1970 as the base year using Simple 
Aggregate Method. 
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Prices 
Items 1970 =: 1971 1972 1973 
felt ie of li ENE el i a 
Rice 5674 58'7 572 60°4 
Wheat 482 503 517 53°3 
Pulse 121°3 1246 §=6180°3 = 1835 


SoLurion : 


Simple Aggregate Index Number = a x 100. 


~ Let Po, P1, Pas Ps represent prices in 1970, 1971, 1972 and 1973 
respectively. i 


eeeeseSeFeFeFse 
Items Po Pr Pa Ds 


Rice 564 58°7 572 60°4 
Wheat 48'2 50°3 51°7 53°83 
Pulse 121°3 124°6 180°3 133°5 


Total 925'9 233'6 239'°2 247'2 


po = 225'9, Sp, =233'6, Sp.=239'9, Xps = 247'2 
Price Index Number for the year 1971 with 1970 as base year 


A hE So 
Tosa Sp, * 1 9959 * 100 103°4 


Price Index Number for 1972 with 1970 as base year 


ye Spal ae Reem one Seale 
Tose Spo * 100 99579 * 100 =105°9 
oe SBP 16s ES Rk 
Buatlasty, Toss Spo x100 995°9 f 100=109'4, 
EXAMPLE : st A i dim 
From the following data, construct an'index for 1978 taking 1970 — : 
a8 base using Simple Aggregate Method i 
1970 1978 
sh Price of Rice per kg. Rs. 9°20 Rs, 3°45 
Price of Egg per doz. Rs, 3°10 Rs, 4°50, 


Vea 
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SOLUTION : 
Construction of Price Index : 
1970 «1978 
Price of Rice per kg. Rs. 2°20 Rs. 3°45 
Price of Egg per dozen Rs. 3°10 Rs. 4°50 


Total Rs. 5°30 Rs. 7°95 


795 fe 
Ton 5°30 * 100 = 150, 


Now, instead of taking price of rice per'kg., if we take the price of 
rice per quintal, we have, 
1970 1978 
Price of Rice per quintal Rs. 220°00 Rs. 345°00 
Price of Egg per dozen Rs. 3°10 Rs. 4°50 


Total Bs. 223°10 Rs. 349°50 


349'50 
223°10 


It is now seen that index number has been increased from 150 to 
156'2 as we take the price of rice per quintal instead of price of rice 
per kg. Thus, this method has a serious defect as a difference in the 
unit of quotation makes a difference in the index. So, manipulation 
becomes possible by quoting prices in different units and which’ is not 
desirable. As a matter°of fact, equal importance’ is given to all the 
items here which is hot correct. °For these reasons the simple aggre- 
gative index is rarely used. 


x 100 = 156°2. 


oi Ton = 


(1-2). Weighted Aggregate of Prices 


This price index. is the aggregate of weighted prices of the given 
period expressed as a percentage of that of the. base period, 


Symbolicalli’ Tors ars x 100, ses oN 


Where w’, w’, w’”, ... are the weights of different items. 
Saw = ppl’ + palo! + palo" Bees 


and) = Spow =pow’ + pow! + D0 wl + a 


In general, the weights used in this method are quantities con- 
sumed, marketed or sold in the base period or given pericd or the 
average of several periods. 
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Various formulas obtained by using various system of weights” 


(A) LaspHyres’ FoRMULA ; 
In this formula base period quantities are used as weights, So, 
substituting w= qo in (2) we have, 


=2Pnfo x 19 aa oes (8) 
Tom Sada * 100: (8) 
(B) PAASOHE’s FoRMULA 


Here given period quantities are used as weights. So, putting Qn 
for w in (2) we obtain " 


Ton =Pndn y 100. = we (4) 
odn 


From the practical point of view the Laspeyres’ Method is easier 
to use since in this method, base year quantities being weights, it is nof 
necessary to select a new sets of weights as the given period changes. 
On the other hand, Paasche’s Method involves the selection of new 
quantity weights for each period considered and in most cases the 
weights are difficult to obtain so frequently. 


It is most common that, if the demand schedules of the 
Consumers are fixed, the consumers purchase relatively larger 
quantities of those articles that have decreased in price relatively to 
other articles and relatively smaller quantities of those articles that 
have increased in price relatively to other articles. For this reason it is 
possible that Laspeyres’ Index which uses base Period quantites ag 
weights will show an increase and Paasche’s Index which uses given 
period quantities as weights will show a decrease in the price level. — 
Thus Laspeyres’, Index is said to have an upward bias and represents — 
the upper limit of the price change while Paasche's Index has a down 
ward bias and represent lower limit of the price change. i 


(C) MarsHanL-EDGEWoRTH FormMvULA 


Since Laspeyres’ formula has an upward bias and. Paasche’s 
formula has a downward bias, the compromise solution known as 
Marshall-Hdgeworth formula which uses the average of the base year 
quantities and given year quantities as weights and is : a 


: ZPnlqo + an)/2 ZPngot+ DS 
Tom = a ‘oO Dndn 
aM Zpolgo + an)/2 eae Pogo + SPodn 


* 100. Be) 


(D) FrsHeR’s IDEAL FoRMULA 


Another compromise solution called the Fisher's Ideal Formula 
which is the Geometric Average of Laspeyres’ formula and Paasche’s 
formula and is given by ; 


Ton= RPndo ,, ZPndn a 238 \ 
BE : 2Podo = =Podn ue (6). 
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This formula is known as Ideal, since the upward bias of Laspeyres’ index 
number neutralizes to a great extent by the downward bias of Paasche’s index number. 


_ EXAMPLE: 


Construct the index number for 1960 using 1959 as base year 
from the following data using Weighted Aggregate Method : 


pee en te ce Se ann GEESEnNnCE IRENE) 


Items Weights Price per unit (Rs.) 
1959 1960 
A 40 16°00. 20°00 
B 25 40°00 60°00 
Cc 5 0°50 0°50 
D 20 r 512 6°25 
E 10 2'00 1°50 


a 


SOLUTION : 
Weighted Aggregate Index Number = EUPo 100, 
Z Swpo 


Let py and pq, represent prices in 1959 and 1960 respectively 
and w represents weights. 


Table for Calculation of Index Number 


Items. w Po Dn - WPo WPn 
A - 40 16°00 20°00 640'00 800:00 
B 25 40°00. 60°00 1000°00 ©. 1500'00 
Ce} ; 5 0°50 050 - 2°50 2°50 
D 20 5:12 625 10240. 125°00 
E 10 2°00 150 20°00 15°00 


Twp =176490, Twpn = 2442'50 
9449°50 1.) aac 
Index Number = 176490 * 100 =138"4. 


EXAMPLE : 


Construct Laspeyres’, Paasche’s, Edgeworth-Marshall’s and 
Fisher’s Ideal Index Numbers from the data shown in the p. 298. 
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Items Quantity Price 
Base Current Base Current 
year year , year year 
cA 10 12 12 15 
B 5 10 8 10 
C 12 16 10 12 


SoLvrIon : 
Table for Calculation of Indea Numbers 


tems do Qn Po Bn Podo PnIo Podn Prdn 
pe SIE EAU ES SR RI DBP yh thd lit 
A 10 12 12 16 120 150 144 180 
B 5 10 8 10 40 50 80 100 


C 12 16 10 12 120 144 160 192 


2Podo = 280, Spndo= 344, Spogn=884, Spngn=472 
Laspeyres’ Index Number = $44 x 100 = 129'86. 
Paasche’s Index Number= $$? x 100 =129'99. 
- 344 +472 
Edgeworth-Marshall’s Index Number 980 + 884 x 100, 
816 32: 
4 664 x 100=122'9. 
Fisher's Ideal Index Number = ,/199'86x 129°92 = 123'88., 


EXAMPLE : 
Caleulate the Laspeyres’, Paasche’s and Edgeworth-Marshall’s 
Index Numburs from the following data : 
eee 


Tans Base year Base year Current year  Ourrent year 
price (Do) quantity (qo) price (pn) quantity (qn) 


(Rs.) (Kg.) (Rs.) (Kg.) 
A 5 bone aS 56 
B 3 100 4 120 
(@} 4 60 6 60 
D 11 380 “14 94 
Ao Wlledes 2% 40 10 : 36 
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SOLUTION : 


Calculation of Index Numbers 


Items Po do Da In Poo Podn Pndo Pnan 
A 5 50 10 56 250 280 500 560 


B 3 100 4 120 300 360 400 480 
10] 4 60 6 60 240 240 - 360 360 
D 11 30 14 24 330 264 420 336 
E.. 7 40 10 36 280 252 400 360 
TPodo = 1400, Spvdn=1396, Tngo= 2080, SPngn = 2096 
20805 spawn, 
Laspeyres’ Index Number 1400 *100=148'6 
; 2096 ie 
Paasche’s Index Number 1396 *100=1501 
2080 + 2096 _ 4176 


Edgeworth-Marshall’s Index Number = 779947396 ~ 9796 * 100 = 149. 


(2a) Simple Average of Price Relatives 


4 It is the average of a set of price relatives obtained by expressing 
the given period prices as percentage of the corresponding prices of 
_the base period. 


(i) When A. Mois used: 


ve . (”) 
Ton= omit i 
(ii) When G. M. is used ; 


(8) 


, ” 

Pn | Pn » Pn 

we (Bm yg Pe BO xe ces eee 
se le Do’ Do” 


here N is the number of commodities. ~ 


-b) Weighted Average of Price Relatives 


Tt w', w", wl err be the weights,, then the weighted average of 


ne 
ia relatives are obtained by multiplying each relatives by its weights 
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and dividing their sum by the sum of the weights. The formula is as 
follows : 


2\e**) x 100. see oe (9). 


Ton = 


Here, in general, the weights are value weights. 


(i) When the weights are base period values, then w= Yodo and we 
obtain from (9), 


> : Dn 

Podo * 

. _ Di ( Po00* 5) 00 ee + (10 
Town DPodo x1 ( ) 


= 2Pndo * 100 © 
=Podo 


which is nothing but Laspeyres’ Formula. 


(ii) When weights are given period quantities at base prices, 
then w= pogn and from (9) we get, 


ee INQ: ee Bee () 


j 2=Podn 
which is Paasche’s Formula. 


Construction of General Index Number from Group Indices. 


When the index numbers of various groups of items are given, 
then usually the weighted arithmetic mean of the separate index 
numbers will give the index number of all the groups combined. So, 
if, Ta, Is,......,In be the index numbers of various groups and 
Wi, Wa, Wg, -....+5 Wn be the corresponding weights, then the combined 


index number of all the groups is given by 
(i) When A. M. is used : 


- qe +ITawo terres +Inwn, 
Wit We fF iceeeee te Wn 


(ii) When G. M. is used : 


iS 
I =(Iy% eT gt? sreere Ty @n)22 t U2 FF ty 
where m=number of various groups, 
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EXAMPLE : 

Given the following data compute price index number for the 
year 1978 (base year 1970) using (i) simple average, (ii) weighted 
average of price relatives. 


Items Price Weights 
1970 1978 ’ 

Rice 180 200 TT 

Wheat 140 170 

Pulse 480 500 4 

Fish 14 16 ih 


SOLUTION : 


Calculation of Price Relatives 
DID 


Dn " Pn 
Items Po Dn w ns ey 
Rice 180 200 if A Oe 129914 
Wheat 140 170 5 121°4 607'0: 
Pulse 480 500 4 1042 416°8 
Fish 14 16 1 114°2 114°2 


Pn = 450°9, Sw =23601, Sw= 21. 
Po Do 


aed =112'7. 


Simple average of Price Relative Index Number = 


2 1 


Weighted average of Price Relative Index Number = =119°4. 


EXAMPLE : 


From the following compute price index doksbec using simple 
average of price relatives based on (i) A.M. and (ii) GM: 
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Commodities Price 
Base year Current year 
A 25 30 
B 20 22 ; 
Cc 30 33 
D 12 15 
E 90 99 


SOLUTION : 


. Calculation of Price Index Numbers 
oS BOE SEE CE ae fl EY So 
Dn 


log Po log pn logpn—log po 
Po 


Commodities po Dn 


25 30 30+ 25 1°3979 14771 0°0792 
20 22 22+ 20 13010 13424 0°0414 
30 33 33 +30 V4771 ‘15185 0°0414 
12 15 15+12 1°0792 11761 0°0969 
E 90 . 99 99+90 1°9542 19956 O'0414 


Total 0°3008 
SonSnasnerrreneeme reer) 


Simple average prico relative 1 @ 29 38.15.99 


vu aw PF 


Index Number (using A.M.) = 3 lost oot a0* iat 90) * 100 = 113. 
Simple average price relative I na rr Iv vet 
Index Number (using G.M.)=I = {es EO aie DE Dt \ 
al e pol” polt™ poll ™ pol * po¥ 
, F x 100. 


Poutiog T= ; 3 log 2" + Jog 100 
Po 


=% B (log pn—log po) + log 100 
=4£x0°3003+2 
=0°06006 +2 
=2'06006 
I=114'8, 
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Chain-base Method of Construction of Index Numbers. 


In this method indices are compiled for each year with previous 
year as base and chain them, so as to get a series referring back 
to a base year. Symbolically, 

Town’ = Toya X Taig X00 XT ney n 
where Ij,; the index with i as base and j as given year..Io,1,Ii,2, 
Ta,g .*** are called Link Indices and Io,n’ is called Chain Index. 


If Laspeyres’ formula is used then, omitting the factor 100, we 
have, 


= 2P1%0. —2Padi. .., I, = 2 Pngn-1 
Tova 2Podo Ty 2pids Con" Sn-10n-2 
2P1do , VP2d1 TPngn-1 
d- Io jy! oe See Kak : 
BE om Spode SP191 2Pn—-19n-1 


And when Paasche’s form is used then, omitting the factor 100, 
obviously, 


—2PiN. Spade... = _2Pngn , 
sas Bpot- Ta, 22192” Tern ZPn-19n 
2P191 ,, VPada SPngn 
ad Io)/a= x XxX r 
ane 2)" > Segue eaoiae ZPn-1dn 


Advantages : 


(i) Direct comparison between two successive periods possible 
through link indices which is more important to commercial 
users than the comparison through remote base period. 


(ii) When weights are changing rapidly it is desirable to use 
chain-base indices. 


(iii) It facilitates introduction of new item replacing the obsolete 
old one, 


Disadvantages : 
(i) Amount of calculation involved in this method is immense. 


(ii) Hasy interpretation is lacking. 


EXAMPLE : 


From the following prices (Rs.) of the groups: of items, calculate 
chain indices with 1974 as the base year, using A.M. of price 
relatives ; 
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Group 1974 1975 1976 1977 1978 


1 3 4 5 6 7 
2 8 10 12 15 18 
3 6 8 5 10 12 


SOLUTION : 
Calculation of price relatives based on preceding year 


* Group 1974 1975 1976 1977 1978 


co Lin DUO Se is Se 


ik 100 183°338 12500 120°00 116°67 
2 100° 12500 12000 125°00 120°00 
3 100 13333 62:50 200700  120°00 
Total : 300 891'66 80750  445°00  356'67 


Link Index : 100 18055 10250 14833 118'89 


ee” 


Calculation of Chain Indices 


Year Link Index Chain Index 
1974 100 100 

; 100 
1975 130°65 toy * 180'55 = 130°55 
1976 102°60 13s % 109°50 = 183°81 
1977 148°33 188) X 148'33 = 198°78 
1978 118'89 wes * 118°89=987°49 


ee 


Note. Fixed-baso Method and Chain-base Method of Index ;Numbers when — 
Paasche’s formula is used. 


Fined-base Method Chain-base Method * 
2d ae 
yon S00 Jon Sout 
22299 2p 2 y BDsda 
as ope ee tae TRS 2d 
See ee ee 
Jom= Spode Toya = Tosa Tis x xTenerton 
w2Prds y, BP als y. .., yy ZPan , 
Bp *3p 0s Zpa-190 
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Consumer Price Index or Cost of Living Index. 


Cost of Living Index Numbers are generally intended to 
represent the average change in prices over a period of time, 
paid by the ultimate consumers for a fixed set of goods and services. 
It measures the relative changes oyer time in the cost level required 
to maintain similar standard of living of a specified class of people, 
whose consumption pattern does not vary widely, living in # specified 
region within which retail prices are almost equal. 


Items contributing to Consumer Price Index are generally 
classified under five major groups : 


(i) Food, (ii) Clothing, (iii) Fueliand Light, (iv) Housing, 
(v) Miscellaneous. 


Each’ of the groups again includes a large number of items. 
For example, the group Food includes Rice, Wheat, Milk, Vegetables, 
Egg, Fish/Meat, Tea, Butter/Ghee, etc. The items of consumption 
included are those which the people for whom the index is meant 
generally consume. However, for saving time and labour, the number 
of items selected should be limited but representative. The retail 
price quotations of the items selected must be easily available and 
should be taken at regular intervals from those sources from which 
the people obtain their goods and services. For each item if the 
price-quotations obtained from different sources differ then the average 
of the price-quotations to be used. 


Relative importance of various items for different classes of 
people being different, cost of living index should always be weighted. 
Index Numbers are then. caleulated for each of the five groups by any 
of the following methods : 

(i) Aggregate Hapenditure Method or Aggregative Method. 

(ii) Family Budget Method or Method of Weighted Relatives. 


(i) Aggregate Hapenditure Method: In this method quantities 
oftitems cousumed by particular group in the base year or their pro- 
portions constitute the weights and the Index Number is the aggregate 
of weighted prices of given, year, expressed as a percentage of that in the 
base year. ae 


Symbolically; Group Index (I) = ae «100 


which is the Laspeyres’ Method discussed earlier. 


(ii) Family Bicdget “ Method’? In this’ ‘method the expenditure 
of an average family on various items is ‘to be estimated after 
carefully studying the Family Budgets of large number! of people 
for whom the Index is to be compiled. This “family budget inquiry’ 
is conducted in the base year, using the technique of random 
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sampling. The expenditure of an average family on each ii 
constitutes the weight (w) of the item within the group. The wei 
are thus the value weights (ie., pogo) and the Index Numbe. 
calculated by using weighted average of price relatives. Thus, — 


) Dn 
Groupe . Index o-2bee 


es ae a 100) Rots, 
=Podo 


= =Pn%o 199, 
=Podo i 


This method is thus the same as that of the Aggregative 
Method, } 


In practice, however, the expenditure of an average family 
each item is expressed as a percentage of the total expenditui 
of the group and the percentage so derived is used as weight i 
of the item within the group. Hence, in this case, 


since W=Pogo | 


w= Bode * 100, and we get, 
2Podo 


Dr 
Group Index we Gem) 


The Oost of Living Index Number is’ then calculated us! 
Weighted Arithmetic Average of Group Indices, 


sIW 
OL. Bia 


where W, the weight of a 8roup index, is the percentage of the total 
expenditure of an average family spent on that group. 


Uses of Cost of Living Index Numbers. 


(1) OL. I, numbers are used for adjustment of dearness allowance 

to maintain the same standard of living as in the base date. 

(2) It is used in fixing the wage policy, taxation and in a larg 
number of economic policy, 4 

(3) Reciprocal of C.L. I. is used’ as a reliable measure of the 
purchasing Power of money. In fact, the purchasing powe 

of money varies inversely with the O. L, I, ‘ 

(4) Real wages can be measured by dividing the actual wage 

“en ail during & period by the corresponding C. L. I. of that 
c period. a fend = at 
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EXAMPLE : 


The following table gives the, group index numbers and the 
corresponding group weights with reference to the cost of living for = 
given year. Construct overall cost of living index for the year. 


———————— 


Groups Index Number Weight 
Food 360 60 
Clothing 295 5 
Fuel and Light 287 vs 
House rent 110 8 
Miscellaneous 315 . 20 
SOLUTION : 
Calculation for C. L. I. 

Groups Index Number (1) Weight (W) Iw 
Food i 360 60 21600 
Clothing 295 5 1475 
Fuel and Light 287 7 2009 
House rent 110 8 880 
Miscellaneous 815 20 6300 
Total 100 32264 


39964 — aooeay 
Cost of Living Index Number=j99 ~ $22'64. 


EXAMPLE : 


Galeulate the changes in the Cost of Living figures for 1978 as 
compared with 1975. ‘ 


Items Food Rent  Olothing Fuel Miscellaneous 
Prices (1975) : 950 60 80 50 200 
Prices (1978) : 270 80 100 50 950 
Percentage 


expenditure : 35 20 15 10 20 


Sees cathe a ce Re 


808 BUSINESS STATISTICS 


It is decided by the management of a firm to increase the D.A, 
the workers, who were drawing wages Rs. 200 per month per worl 
1975. How much D,A. should be given to them, so that 
are compensated on account of change in C. L. I. for the year 197 


SOLUTION : 


Let Yo and pn represent the prices for the years 1975 and 1! 
and W represents the percentage expenditure. 


Construction of C. L. I. for 1978 with 1976 as base year 


Items Ww pore Ypg -P - x 100 
Food 36 250 270 108°00. ~3780°0 
Rent 20 60 80 133°38 
Clothing 15 80 100 125°00 
Fuel 10 50 50. 100°00 
Miscellaneous 20 200 250 125°00 
Total 100 


Cost of Living Index Number = =118'22. 


Thus there is an increase of 18°22% in the cost of living in J! 
as compared to 1975, : 


11821°60 
100 


Calculation of D.A. ~ 


Since the O. L. I. is 118'29, the worker, who was drawing Re. 10 
in 1975, at present he should draw Rs. 11829. As the worker ¥ 
drawing Rs. 200 in 1975, at present he should draw 

118°22 ; 
“100 * 200 =Rs. 236°44, : 
_ Hence the D.A, should be increased by Rs..236°44 — Rs, 200 
Rs. 86°44 to compensate on account of change in OC. L. I. 


Quantity Index Numbers. 


“kK Quantity Index Number measures the change in volume 
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quantity index number formulae can be obtained easily by changing p 
to q and q to p in the corresponding price index number formulae. 


Some important Quantity Index Number formulae 


Price Index Numbers 


(1) Pom See x 100 
"On 


(2) Pon= =P? x 100 


= 2Pndo 
(3) Pon Ss 7, 7100 


(4) Pon = 228m x 100 


2Podn 
(5) Pom = a/ 22nd, ZPmde x 100 
TPodo TPodn 
EXAMPLE : 


Corresponding Quantity 
Index Numbers 


(1) Qoin= sae 100 


Qn 
(2) Qom ==” * 100 


2dnPo 400 
ZoPo 


(4) Qon'= —— 100 


(3) Qom= 


ip SE are 
ZqnPo x DPndn y, 100 


5) Qon= 
(5) Qom= Af Soda * SPnio 


Given the following data, calculate quantity index numbers using 


(i) Laspeyres’ 


formula, (ii) Paasche’s 


formula and (iii) Fisher's 


formula. 
Items Base year Base year Current year Current year 
price (po) — quantity (qo) price (Dn) quantity (qn) 
(Rs.) (Ky): (Rs.) (Kg.) 
A 5 50 10 66 
B 8 100 4 120 
Cc 4 60 6 60 
D 11 30 14 24 
E 40 10 36 
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Sonvrion : 


~ Calculations Sor Index Numbers 


Items Po do Dn Qn Podo Prdn Pndo 
A 50 10 56 250 560. 500. 
B 100 4 120 300 480 400 © 
Cc 4 60 6 60 240 360 360 
D 11 30 14 24 330 336 420 

E 


7 40 10 36 280 360 400 


=Poqo=1400, — Spogn= 1396, — Spngo= 2080, — SPngn= 


Laspeyres’ Quantity Index Number = Hee x100=99'7. 


Paasche’s Quantity Index Number ae x 100 =100'8. 


Fisher's Ideal Quantity Index Number= ./99°7x 1008 = 100°2. 
EXAMPLE : 4 : 
Calculate the Production Index for 1921 with 1910 as base year. 


Kind ee fuel Quantities Value in 1 : 
1910 1921 (million rupet 


- Bituminous Coal ( million tons) 41710 415°90 1948 
Anthracite Coal (million tons) 84°49 90'44 731 
Oil (million barrels ) 209°60 479°90 712 


Sonution : 


Let go be the base year quantity and g», be the current 
quantity. Then value in 1921, i.e., current year is Pngn. Since Png 
is given, Paasche’s quantity index number formula may be w 
For this we are also to find the product Pngo. Now, we can find fn 

Value in 1921 


paral son y= Bnd ; 
_ using the relation p an Quantity in 1921 4 
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Calculation of Production Index Number 


Items do qn Pngn Pn= ee Pndo 
im 

Bituminous coal 

(million tons) 41710 415°90 1948 4.68 1952'03 
Anthracite coal 
(million tons) 84°49 90°44 731 8°08 68268 
Oil (million barrels) 209°60 472°20 712 1'52 318°59 

Spngn = 339100, Spndo = 2953'80,, 


p y _ 3891-00 Ft 
Paasche’s Quantity Index Number 9953°30 * 100 114 82 


Tests of Index Numbers. 

To judge the merit of various: index number formulae noted 
economists and statisticians have suggested a number of mathematical 
tests. Of these Time Reversal Test, Factor Reversal Test and Circular 
test are most important. An index may be considered ‘ideal’ if it 
meets these tests. , 


(1) Time Reversal Test. ; 

“The test is that the formula for calculating the index number 
should be such that it will give the same ratio between one point of 
comparison and the other, no matter which of the’ two js taken as 


base,” Bymnbolically, Tom %Ino™d, taco maagts-tn Tovn is the reci- 
procal of Ino, where Jo,n is the index for time ‘non time ‘o’ as base 
and In.o is the index for time ‘0’ on time ‘n’ as base. 


Unweighted G.M. of price relatives satisfies this test for : 
Ul " Ui wor 
Ton = (2% x eax i x vo) 


is the reciprocal of 
Do! Dol Do." x 
In o™ (Ps, x x x +) 
no Pr! Dn!’ pu 
The test. is neither obeyed by Laspeyres’ Method nor by Paasche’s 
Method. ' 
Taking Paasche’s Method, omitting the factor 100, we have, 


— 2Pndn I = Loge 
ae ZPodn nee 3 Zpndo 


Hak Toe Ts SEPA ZPolo 34 not in general equal to 1. 
EPogn WPnJo i 
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= 2Pndo ,, EPodn 
but, Tom Imo Spodo  EPngn 


The Fisher's Index Number Satisfies this test. We have, omit 
the factor 100, 


is not in general equal to 1. 


DWqnPn | DdoPn d Ino =. / 2P0G , SanPo 
= x =" and In, a/ x 
Tom 2InPo  TVqoPo n'a ZVoPn LanPn 
which Bives. Ton XIn.o = 1. 


(2) Factor Reversal Test. ’ cme 

The product of the: Price Index Number and the correspondit 
Quantity Index Number obtained by interchanging p by q and qb 
Should be the Value Index, 1 8 


Neither the Laspeyres’ Method nor the Paasche's Method sat: 
this test, ; 


This test is satisfied by 
the factor 100, we haye, 
ZPnGo , DPndn 
Pon= x 2Pndn 
aa 2PoGo  VSPogn 
iu 2 Eee RS 
TInPo , 
and = x 2Padn 
ual Qomm Pogo 2oPn 


. Bisher’s Index’Number. © For, ‘omi tin 


which gives, Pon Qon= ZPnlo  Bniin . EAnPo  EPugn 
hentia =Podo EPodn SPodo ~ SdoPn 


= 2Pndn _ 
=Po%o Von ‘ 


where Pon, Qom and Vom are respectivly the Price Index Numbe: 


Quantity Index Number and Value Index Number for time ‘n’ on 
time ‘o’ as base. 


(8) Circular Test. 


This is an extension of time reversal test. Symbolically, 
Tosa X Taya XIgygX-* x Tn-1)n X Injo = 1. 


This test is satisfied only by G. 


; M. of relatives and by fixed weight 
aggregates, 
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EXAMPLE : i 


Using the following data, show that Laspeyres’ Price’ Index 
formula does not satisfy the time reversal test. 


rae ria Gla wi Brice tantly 
A 6 50 10 56 
B 2 100 2 120 
(e} 4 60 6 60 
D 10 30 hia) 24 
E 8 40 12 36 


SOLUTION : 


Let po, Pn denote prices in the base year and current year and 
Jo. Qn the quantities in the base year and current year. 
re 


Items Po Go Pn Gn Poo Pulo Pon Pun 


A 6 50 10 56 309 500 336 560 
2 100 2 120 200 200 240 240 


B 
Cc 
D 10 30 12. 24 300, 360 240 288 
B 8 40 12 36 320 480 288 432 


SPodo = 1360, Spndo=1900, Spogn=1344, Spngn= 1880 
Omitting the factor 100 from each ac 


Laspeyres’ Index Number (Ton) = we =1' 30. 
be hus 


Tmo 1880 
Ton ¥ Injo= 1°39 X 07141. 
Laspeyres’ formula does not satisfy time reversal test. 


EXAMpue : 


Using above data, show that Fisher’s Ideal formula satisfies Factor 
Reversal Test. 
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SOLUTION : 
Omitting the factor 100 from each index, 5 
1900. 1880 
Fisher's Ideal Price Index Number (Po,n) = 1360 * 1344 
see : _ /1344 , 1880 
Fisher’s Ideal Quantity Index Number (Qo,n) 1360 * 1900 


1900 x 1880, 1344 | 1880 


“+ Pom * Qom = A/ 1360 * 1844 * 1360 * 1900 
= 1880 _ Spndn _ 
1360 ~ Spoqo Value Index. 


Thus Fisher's Ideal Index formula satisfies Factor Reversal Test. 


Purchasing Power of Money. 


It is well-known that the valwe of money, also called purei 
power of money, changes from time to time with the changes of 
line. When price line is steady, the value of money is stable. 
goes down with rising price line and vice versa.’ We 


Ifin a period of reference. the Price Index has risen to, say 
obviously, what.a rupee will buy is only $29 or $th of what it 
to buy in the base period. This means the money value of ru) 
one in the base period is dropped to #="71 or 71 paisa approximate! 
or we may say the purchasing power of a rupee with refer 
to a particular base period is now approximately 71 paisa. Th 
it is seen that purchasing power of money or value of money 
really, inversely proportional to the price index and at a given per 
with reference to the base period may be measured in the followin 


Purchasing power of money or value of money 

: % fey eee OUT 7 

Price Index Number 

Again, if the price index for the year, say, 1960 be 11 
and the price index for the year, say, 1950 be 98°4, then 
purchasing power of a Rupee of 1950 will be (110°3 +98'4) of 1960. 


Thus the purchasing power of a Rupee in any year in terms 
of 1960-Rupee will be given by, - ad “a 


Purchasiie Power Price Index Number for 1960_, 

: e Price Index Number for the year ‘ 

The index number also provides an. excellent material for 
transforming the money (nominal) income to real income. 1 
income is the equivalent income in terms of the value of mone’ 
in the base year. It is obtained by dividing the money inco' 


by an appropriate Price Index. This process is known’ as’ ‘statisti 
deflation’. 


INDEX NUMBERS 315 


Money Income or Wage , 199 


Thus, Real Income or Wage= Price Index Nuiber 


and the Real Wage or Income Index Number 
_ Real Wage of current year , 100 
Real Wage of base year "4 
_ Index of Money Wages % 100: 


Price Index Number 


AXAMPLE : 

The table below gives the average wages in Rs. per 
day of a group of workers from 1947 to 1951 and the consumer 
price index for these years.. Determine the real wages of the workers. 


Year 1947 1948 1949 1950 1951 
Average Weges 119 1°33 1°44 157 1°75 
Consumer Price Index No. 1 

1947—49 = 100 95°5 1028 1018 102°8 1110 
SoLurion : ' 

Real wage= Rote ee 100 


Consumer price index number 


Ca'culation of Real Wages 


_——————— 


Year Averege Wages Consumer Price Real Wages 
Index No. (Bs.) 

1947 119 95°5 (119+ 95'S) x 100 = 1°25 

1948 ihe} 102°8 (1°33 + 102'8) x 100 = 1°29 

1949 1°44 101°8 (1°44 +101'8) x 100 =1°41 

1959 157 102'8 (1°57 + 102°8) x 100 = 1°53 

1951 1°75 1110 (1.75 +111'0) x 100=1'58 


a 


Uses of Index Numbers. 


Price Index Numbers guide the businessmen in forming suitable 
Policies. Stable price line or slowly rising price line is considered 
favourable to the stability of business. When the price’ line is 
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high businessmen frame one policy and when low they frame a 
different one. Price index numbers measure the changes in the 
price line in order to study the changes and try to control them. 


Wages and salaries are being adjusted now-a-days on the 
basis of appropriate consumer price index numbers. 


Price index is also used as a reliable measure of the purchasing 
power of money (value of money) which varies inversely with it. 
Money income is often transformed into real income by dividing 
it by an appropriate price index. 

Index of industrial production is used to study the general 
progress of the business condition of the country and to compare 
the production of one’s own business. 


Tnvestment index numbers guide the economists, speculators 
and bankers in various ways. Index numbers can also be used 
for measuring changes over space, viz. Consumers price index of 
textile workers in two cities for the same period of time. 


Tt has also application in measuring the difference in the level 
of intelligence among students in different institutions, in 
standardizing the birth and death rates, to measure changes in the 
effectiveness of school systems, They are also used to forecast 
the future trend. 


Index numbers are most widely used in the evaluation of 
general economic behaviour in the country. In the economy a8 
a whole the index numbers of overall business activity are called 
business barometers’. 


‘Index numbers are to-day one of the most widely used statistical 
devices...... . They are used to take the pulse of the economy and 
they have come to be used as indication — of inflationary . oF 
deflationary tendencies.” —G, Simpson and F. Katka. 


Construction of Wholesale Price Index Number. 


Wholesale price index number is intended to measure relative 
variation in the general price level in a given period of time compared 
to some base period. To study the relative change in the general 
price level, data about the wholesale prices of the commodities 
marketed are to be collected. Since it is not possible to include 
all, the commodities in the index, a sample of the commodities 
has to be taken in such a way that they will be manageable 
in number and representative of the taste, habits and customs of the 
people. The items included should be of standardized quality and 
are broadly divided into five major groups : 

(1) Food ; 

(2) Fuel, Light, Power and Lubricants : 

(8) Liquor and Tobacco ; 
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(4). Industrial Raw Material ; and 
(5) Manufactures—(a) Intermediate Products; (b) Finished Products. 


Each group is again divided into a number of sub-groups. For 
example, the group ‘food’ is subdivided in. three subgroups, i.¢., 
(i) Cereals, (ii) Pulses and (iii) Others. Weights are, in general, 
proportionate to the value of the quantities marketed in the base 
period. A period of économie stability should be taken as base period, 
Wholesale prices of the commodities included in the index are to 
be collected at regular intervals of time trom Govt. agencies and 
commercial centres. Price quotations are taken generally once a 
week and index number is calculated. The G. M. of weekly indices 
is taken as the index number for a month. 


The method for constructing the index number is Weighted 
Arithmetic Mean of Price Relatives. Geometric Mean of Price 
Relatives can also be used in the calculation of the index number. 


Index of Industrial Production 


Index of industrial production is generally intended to measure 
the relative variation in the level of industrial production in a 
country in a given period of time compared to some base period. 
It is constructed to study the changes in the quantum of production 
and not in values. For the compilation of such index numbers, 
data about the level of industrial output of various industries 
are to be collected. Since it is not possible to include all the 
items of industrial production in the index number, a sample of 
items has to be taken in such a way that they are manageable 
in number and representative in character. The items included 
Should be of standard quality and are broadly divided into six 
major classes : ~ ; 


(1) Teatile Indtistries ; 

(2) Mining Industries; 
(8) Metallurgical Industries ; 
(4) Mechanical Industries ; rig 
(5) Industries subject to excise duties ; and 

(6) Miscellaneous. 

Each class is again divided into’a number of important items, 


4 example, the class textile’ industries include cotton, woollen, 
Silk, ete, a - Gs " : 


The method ‘used’ for constructing such index numbers is, in 
Seneral, weighted arithmetic mean of production relatives. Weighted 
8eometric mean of production relatives may also be used in the 
Construction of this index number. " i 


ned 
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Weights are selected on the basis of relative importance of 
different industries. Usually the weights are based on the values. 
of net output of different industries during the base period. 


Limitations of Index Numbers. 


(1) Index numbers are, in general, based on samples. Items: 
and quantities are obtained on the basis of sampling and since 
sampling is always subjected to bias, hence errors are introduced- 
So efforts must be made to minimise the errors. 

(2) With the passage of time the tastes and habits of the people 
change and as such there is always a risk of a change in the 
qualities of the items. Introduction of items of new quality and the 
obsolescence of others makes comparison over long period unreliable. 

(3) Though different methods of construction of index numbers. 
will give different results, sometime the difference is substantial 
still all the indices point to the same direction and the trends. 
generally agree unless there are rapid changes in conditions. 

(4) Index numbers can also be manipulated to suit one® 
purpose. Base period having abnormally high profit may be taken 
to show the current profit very low. To show current prices 
very high, base period having abnormally low prices may be chosew 
sometimes. 

(6) Data collected from different sources and regions may nob 
always be reliable since the quality, honesty and intelligence of 
all the investigators are never the same. This with non-availability 


of a perfectly normal base period possesses a serious limitation of 
the index number, 


' Miscellaneous Examples. 


1. In 1976 the average price of a commodity was 20% more 
than in 1975, but 20% less than in 1974 and it was 50% more 
than in 1977. Reduce the data to price relatives : 


(i) Using 1975 as base ; and 
(ii) With 1974-75 as base average, [ 1. C. W. A. Dec. 1978 F 
SoLution : 
Let the price of 1976=100 
So, the price of 1975 = 120%100 _ 538, the price of 1974 ie 
100 x 100 
= 1002100 195 and the price of 1977 = 100% 100 _ 66°67. 


~ Average price for two years 1974 & 1975 = aes = 104.16. 
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Calculation of Price Relatives 


Price  ©PR.with1975 PR, with 1974-75 
as base as base 
1974 195 Rae = 150 TOA _ 14 
1975 83°33 100-100 B08 4, 


2. On the basis of the following data, compute the Index 
Number (i) using simple average of price relatives, and (ii) with 
the help of geometric mean of price relatives : 


Commodites DNS SE 0 MG DEES 2 
Prices (in Rs.) 1948 10 ISA Roa 4/295 
Prices (in Rs.) 1950 12 16 10 8 8 


eb Po and p; be the prices in 1948 and 1950 respectively. 
; Calculation of Index Numbers 


pR.=2 Pd x 
Ps pi PR= 24x 100 log? * x 100 


A 10 12 120°00 20792 
a? 18 16 8889 «19488 
“(0) 12 10 83°33 «19208 


8 » +) 4000 - 16021 
ist 32°00 ——-:1'5051 


$1 100-3699, Soe? x 100=9'0560 
Dox 100=304'99, Diioef%: 


y y 36422 Y 
Index Number (using simple average of P.R.) = ae =72'84 
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Pa 
Index Number (using G.M. of P:R.) = Antilog = Po # 100 


n 
= Antilog Boney 
5 
= Antilog 1°8112 
= 64°74, 


8. Construct Fisher’s Ideal Index Number for the following data : 


1960 (Base Year) 1968 (Current Year) 


Commodity Price Quantity Price Quantity 
sa hee 8 6 12 5 
B 10 5 11 6 
ie) 7 8 8 5 


{TI. 0. w. A. 1970] 
Sonvrion : 


Let po and p; represent prices in 1960 and 1961 and q 
and gi be the quantities in 1960 and 1961 respectively. 


Computation of Fisher's Ideal Indew Number 


Soames ~ 1960 1968 
Commodity Po Go {Pi G1. Podo  P190  Podi . Pits 
A SS payeortt 12°65 48 72 40 60 
B tO eee eine 6s BO: BB. 360... ~.66 


‘SPodo = 154, Spiqgo=191, Spoqi =135, Spiqi1 =166. 


Fisher's Ideal Index Number = ,/ 222% y 2P3% y 100 
oes Me a =Podo  SPodi 
154 x 185 * 100 


=193'5, 
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The price of agricultural commodities for 1966-67 and 
the month of December, 1970 are, given below along with the 
lue of these commodities in 1966-67 


Unit Price Value of output 
1966-67 Dec. 70 in million rupees 
Md. 13°75. 18°75 8364 
Md. O70) 9°70 2207 
Md. 6°03 8'00 876 
Cotton (raw) 784 tb. 466'00 488°00 701 
Tea Show 1°25 175 534 


[Delhi Univ. 1975] . 


_ Calculate the Weighted Index Number of prices of these 
= for Dec. 1970, taking 1966-67 as base, 


a the base period values of detest are given the Index 
mber may be calculated using Weighted A.M. of price relatives, 
ights being the values of output. 

‘Let po and p, represents the prices in 1966-67 and Dec, 1970 
ctively and w be the value of output. 


Computation of Weighted Inde Number 


odities Unit po py Ww P.R.=5* «100 P.R.Xw 


Ma. 18°75 13°75 8364 100'00 8,36,400°00 
Ma. 9°70 9°70 2207 100°00 2,20,700°00 
Md. 6:03 800 876 132'67 1,16,218'92 
784 Ib. 466°00 433°00 701 92°92 65,136'92 
I oe 125 175 5384 140°00 74,760°00 


(price relative x weight) = 13,13,215°84 ; Sw= 12,682 


Pad i 
Weighted Index Number= wise =103'55. 
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5. From the following data, caloulate nesnagl a Me Price Index 


Number for the 5 groups combined. GC. Us, M. Com. 1970] 
Group Weight Index Number 


for week-ending 27. 9. 69 
(Base : 1952-53 = 100) 


Food articles 50 241 
Liquor and tobacco 2 3 221 
Fuel, power, etc. 3 204 
Industrial raw materials 16 256 
Manufactured commodities 29 179 


SOLUTION : 


Let W and J represent the Weight and Index Number for 
the week-ending 27. 9. 69 respectively. 


Compilation of Wholesale Price Index Number 


Group WwW I Iw 
Food articles 50 241 12,050 
Liquor and tobacco 72 221 449 
Fuel, power, etc. 3 204 612 
Industrial raw materials 16 256 4,096 
Manufactured commodities 29 179 5,191 

ZW=100; SIW=22,391 
SIW _ 22,391 


Wholesale Price Index Number = sw 100- = 29391, 


6. The data below show the percentage increases in price of a 
few selected food items and the weights attached to each of 
them. Calculate the Index Number for the food group. 


Food items : Rice Wheat Dal Ghee Oil Spices Milk 
Weight ‘ 33 11 8 5 5 - 3 %: 
Percentage increase 
in price ; 180 202 115 212 175 517 260 
Fish Vegetable Refi eshments 
9 9 10 


426 332 279 
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Using the above food index and the information given below, 
calculate the cost of living index number. 
— 
Group Food Clothing Fuel and Light Rent Miscellaneous 


Index = — 310 : 220 150 3800 
Weight 60 5 8 Cen 18 

£1. 0. W. A72] 
SoLviI0n : 


= Calculation for Food Index Number 


pe SY ST eee a es ET ls 
Percentage increase Current 


Food items Weight (w) in'priea@®) ada) Iw 
Rice 33 180 280 9240 
Wheat 11 : 202 802 3322 
Dal 8 siaagt i) 915 1720 
Ghee 5 212 312 1560 
Oil 5 175 275 1875 
Spices 3 517 617 1851 
Milk 7 260 860 2520 
Fish 9 426 526 4734 
Vegetable 9 332 432 3888 
Refreshments 10 279 879 8790 


Note: Current Index= Percentage increase +100 


w=100 ; SIw= 34000 


_ Iw _ 34000_ 
.’. Food Index Number Sw 100 340, 
Calculations for CO. L. I. 
Group Index (I) Weight (w) Iw 
Food “340 60 20,400 
Clothing 310 5 1,550 
Fuel and light 220 8 1,760 
Rent 150 9 1,350 


Miscellaneous 300 18 5,400 
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=w=100; Slw=30,460. 
Cost of Living Index Number = 30,460 =304'6. 


7. The following table gives the annual income of a person 
and the general price index number for five years : 


Year : 1970 1971 1972 1973 1974 
Income (Rs.) : 3600 4200 5000 5500 6000 
General Price 

Index Number: 100 104 115 160 280 


Determine the real income of the person. 


SOLUTION; 


Actual Income 


Price Index * 109, 


Real Income = 


Calculation of Real Income 
jt ee a ae Settles ieee a 


Year Income (Bs.) Index Number Real Income (Rs.) 

1970 3600 100 3600 x 100 = 3600'00 
1971 4900 104 <a00 x 100 = 4038'46 
1972 5000 115 a0 x 100 = 4347'83 
1973 5500 160 ae 100 = 3437°50 
1974 6000 980 O00 x 100 = 2149'85 


8. The index numbers of wholesale prices for the years 1947—54 
with 1947—49=100 are given below. Determine the wholesale 
purchasing power of a Rupee in terms of 1954-Rupee in each of the 
given years, 


Year 1947 1948 1949 1950 1951 1952 1953 1954 


Wholesale price 
index (1947—49 ; 
= 100) 964 1044. 992 1031 114°8 111°6 11011 110°3 


[C. U. 1963 ] 
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SOLUTION : 


Index Number for 1954 , 


We have, purchasing power of a rupee = TMAee Nabe tar the veut 


Calculation for Purchasing Power of a Rupee 


Year Wholesale Price Index Purchasing Power of a Rupee 
(in terms of 1954-Rupee) 


1947 96"4 . 1103+ 96'4=1'14 
1948 104°4 110°3 + 104°4 = 1°06 
1949 99°2 110°3+ 99°2=1°11 
1950 1031 110°3 + 103°1 =1°07 
1951 114°8 ; 110°3 +1148 = 0°96 
1952 111°6 110°3+ 111°6 =0'99 
1953 110°1 110°3 +110°1=1°00 
1954 1108 110°3+110°3 = 1°00 


9. Oalculate the index of industrial production of the following, 
using weighted A.M. of production relatives. 


Items ___ Production Value of Net output (Rs.) 
1970 1980 
A 1,00,000 litres _1,60,000 litres 3,50,000 
B 1,20,000 kg. 1,50,000 kg. 2,00,000 
ie} 5,000 m. tons. 7000 m. tons. 9,00,000 
SOLUTION : 
Items ‘ Production Relatives 
A (160000 + 100000) x 100 = 160 
B (150000 +120000) x 100= 195 
Cc (7000 + 5000) x 100=140 
.. Index of Industrial Production i 
- 160 x 350000 + 125 x 200000 + 140 x 900000 
350000 + 200000 + 900000 
=143, 
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EXERCISE 11 


1, The average price of mustard oil per quintal in the years 
‘1954 to 1958 are given below : 


Year : 1954 1955 1956 1957 1958 
Price: 295 275 800 225 250 


Find the index numbers for all the years taking 1956 as the base year. 
(B. U. 1964) [ Ans. 98'3, 91°7, 100, 75, 83'3] 


2. Find the Index Number by method of relatives (using A.M.) 
from the following data : 


“Commodity Base Price Current Price 
ee EERE Se re 
' Rice 35 42 
Wheat 30 35 
| Pulse 40 38 
Fish 105 120 


(6. U. 1973 ) [ Ans, 1075] 

3. -The. following table gives~-the average wholesale prices in 

rupees for three commodities during the years 1965-70.» Construct 
the index numbers for all the years’ by taking 1965 as the base year. 


Commodity 1965 1966 1967 1968 1969 1970 


A 25°3 30'8 33°4 855. 35°3 360 
B 173 146 4°9 57 171 116 
C 78 54 67 56 72 102 


{ Ans. 100, 92, 82, 82, 111, 113] 


ie Find by A.M. method, the index number from the following 
ata: 


* . . 
Commodity Weight Current Price Base Price 


Cloth 13 250 225 
Wheat 18 26 22 
~ Rice 25 5 32 26 
Potato es * 65 70 


(B. U. 1970) [ Ans. 115°5 ] 
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5. Given below are the data on prices of some consumer goods 
and the weights attached to the various items. Compute price 
index numbers for the year 1969 (Base 1968=100), using (i) simple 
average, and (ii) weighted aversge of price relatives. 


Price (Rs.) 
Items Unit 1968 1969 Weight 


Wheat Kg. 0°50 0°75 2 
Milk Litre 0°60 0°75 5 
Egg Dozen 2°00 2°40 4 
Sugar Kg. 180 2°10 8 
Shoes Pair 8'00 10°00 1 


(1. C. W. A. 1979) [ (i) Ans. 127°4 ; (ii) 123°3 ] 
6. Find the weighted index number, using the following data : 


Items Index Weight 
Food — “152 48 
Clothing 110 5 
Rent 130 10 
Fuel and lighting 100 12 
Miscellaneous 80 15 
[ Ans. 128'29 J 


| 7. Apply Fisher’s method and calculate the index number for 1974 
with 1978 as base year from the following data : 


1973 1974 
Items Price Quantity Pricé Quantity 
A 10 Fat ‘12 3 
B 15 6 "20 5 
ie} 5 5 6 
D 4 4 


[ Ans, 189°9 ] 


8. Construct a suitable index number with’ the help of the 
following data with 1965 as base: -| 
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Commodity: Wheat _ Rice Gram 
Year Price Quantity Price Quantity Price Quantity 
1965 14 15 20 6 4 10 
1969 24 12 27 4 7 8 


[ Ans. Fisher’s Index =161'4 ] 


9. Calculate Laspeyres’ and Paasche’s Index Numbers from the 
following data : 


Base year Current year 
Quantity — Price per tb, (Rs.) Quantity Price per tb. (Rs.) 
Rice 6°0 40 70 80 
Meat 40 “45, 50 60 
Tea 05 90 15 “40 


{ Ans. Laspeyres’ Index = 86'02, Paasche’s Index = 81°25] 


10. Construct index number of price from the following data 
by applying (i) Laspeyres’ Method, and (ii) Paasche’s Method. 


Base year n __Qurrent year 
Commodities Price Quantity Price Quantity 
A 2 8 4 6 
B 5 10 6 5 
1) 4 14 5 10 
D 2 19 2 13 


{ Ans. Laspeyres’ Index = 125, Paasche’s Index =126'2 ] 


11, In the construction of rural price index number in a 
certain centre following results are obtained : 


Group Index Number 


Group Weight Oct. 1966 Noy. 1966 
Food 81 323 344 
Lighting 2 190. 186 
Clothing 8 439 397 
Miscellaneous 9 ~ 369 377 


Find the rural price index numbers for the two months, 
[ Ans. For Oct. 333°2 ; For Nov. 348°5 ] 
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12. Using the data given below calculate price index numbers 
for the year 1958 by (i) Laspeyres’ Method, (ii) Paasche’s Method, 
and (iii) Fisher’s Method with the year 1949 as base. 


Commodity Price (Rs.) Quantity ((000 Kg.) 
1949 1958 1949 1958 
Rice 9°3 45 100 90 
Wheat 64 37 11 10 
Pulse 51 2°7 5 3 


[ Ans, 49°1, 49°1, 49°1] 


18. Calculate the cost of living index number from the following 
data : 


Price ATR 
Items Base year Current year Weights 
Food 30 47 4 
Fuel 8 12 1 
Clothing 14 18 3 
Rent 22 15 2 
Miscellaneous 25 30 1 


» [ Ans, 115'84 ] 


14, In the construction of a certain OC. L, I. number, the 
following group index numbers are found. Calculate OC. L. I. by 
using (i) the Weighted A.M. and (ii) the Weighted G.M. 


Group Index Number Saas 
Food 350° 5 
Fuel and lighting 200 1 
Clothing 240 ib 
House rent 160 1 
Miscellaneous 250 2 


[ Wee O.L.I. using Weighted A.M. = 285 
* O.L.I, using Weighted G.M.=275'4 
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15, Find the index number for the years 1967, 1968 and 1969 
by the chain-base method, with base year 1966, from the following : 
Year: 1966 1967 1968 1969 
Link Index: 100 110 95°5 109°5 
(1. 0. W. A. 1969 ) 
[ Ans. Chain indices, 1967 = 110, 1968 =105'05, 1969 =115'03] 


16. From the following table calculate the average percentage 
increase in cost in 1945 over 1989 of manufacturing thin grade 
of linoleum to which the data refer : 


Linoleum Production costs 
—_“inoteum Froduction costs 


Item of cost ~ Ttem as % of % increase in cost of 
total cost items in 1945 oyer 1939 
Materials 48 97 - 
Direct Labour 15 43 
Manufacturing on costs 21 114 
Administrative costs 16 10 


(CG. U. 1966) [ Ans. 78°55 J 


/ 1%, From the following table calculate Paasche’s quantity 
index number for 1969 with 1951 as baso : 


Commodity 1951 1969 1969 
A 54 250 540 
B 93 75 825 
Cc 18 56 - 448 
D 6 8 56 
BE 23 47 141 


(I. 0. W. A. 1972) [ Ans. 144] 


18, Construct Weighted Aggregative Quantity Index Numbers 
taking: 1976 as the base : t ' 
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Commodity Average price Production (0000 tons) 
per unit (Rs.) 1976 1977 1978 
A 200 150 180. 192 
B 3°00 240 220 160 
© 0°50 1100 1200 1500 
D 4°50 22 24 27 


[ Ans. 103°53, 103'98 ] 


19. From the following data, prepare quantity index numbers 
for the year 1974 taking 1968 as the base year : 


Year _ Commodity I. Commoditu IT Commodity IIT 
Price Quantity Price Quantity Price Quantity 

1968 5 10 8 6 6 8 

1974 4 Le lat) 7 5 4 


(1. 0, W. A. 1975 ) [ Ans. Laspeyres’ Index = 120°68 ; 

Pansche’s Index = 120'62 ; Fisher’s Index = 120'65 J 

90. ‘The average weekly wages for all manufacturing industries 
for = number of months in 1960 are 78°52, 79771, 78°55, 78°17, 
78'99 ; the corresponding consumer price index numbers are 115, 
114°7, 114°6, 114°6, 114°9. Find the real wages for the different 
months and calculate the percentage change in the real wages 
during the period. (C. U. 1968 ) 
[ Ans. 68'28, 69°49, 68°54, 68'21, 68°75 and 0'69% ] 


21. During a certain period the C. L. I. goes up from 110 
to 200 and the salary of a worker is also raised from Rs, 325 
to Rs. 500. Does the worker really gain, and if ro, by how much 
in real terms ? 

, [ Ans. No. Real Wege decreases by Rs, 45 ] 

292... From the following prove that the Fisher’s Ideal Index 
Number satisfies both the Time Reversal Test and Factor Reversal Test. 


Base year Current year 
Commodity Price Quantity Price Quantity 
A 6 50 10 56 


B 2 100 2 120 
Cc é 4 60 6 60 
D 10 30 12 24 
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23, Ina working-class consumer price index number of a particular 


town the weights corresponding to different groups of items were 
a8 follows : 


Food—55, Fuel—15, Clothing—10, Rent—8, and Miscellaneous—12, 


In Oct., 1972, the D. A. was fixed by a Mill of that town 
at 182 percent of the workers which fully compensated for the 
rise in prices of food and rent but did not compensate for anything 
else. Another Mill of the same town paid D. A. of 46'5 percent 
which compensated for the rise in fuel and miscellaneous groups. 
It is known that rise in food is double that of fuel and the 
rise in miscellaneous group is double that of rent. 


Find the rise of food, fuel, rent and miscellaneous groups. 


[ Rise of food—317'2, Fuel—158'6, Rent—94'5, 

Miscellaneous—189'0 ; assuming no rise in 

clothing ]. 

24. Following table shows the annual wages of a labourer and 

the price index numbers, Prepare the real wage index numbers for 
the labourer : 

Con Renae cg ee ee ANG ae TS Se 


Year: 1967 1968 1969 1970 1971 1972 1978 
Wages in Rs.: 200 240 350 360 360 370 375 
Price Index: 100 160 280 290 800 820 330 


[ Ans. 100, 75, 62'5, 62, 60, 57°8, 56'8 ] 
25. Monthly Wages average in different years is as follows : 


Year : 1967 1968 1969 1970 4971 1972 1973 
Wages (Rs.): 200 940: 350 360 360 380 400 
Price Index: 100 160 900 290 230 250 960 


Calculate real wages index numbers. 
(1.0. W. A. 1979) [ Ans, 100, 80, 87°5, 82, 78'5, 76, 80) 


NS ————————— 
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TIME SERIES 


Introduction. 


_ A time series is defined as data arranged chronologically. It 
is a set of observations taken, usually, at equal intervals of time 
on a variable which is dependent on time, Such a series of 
observations discloses the changes or variations in the value of the 
variable due to changes in time. The sale of a commodity in different 
years, quarterly export of tea, monthly production of cement, rainfall 
in centimeter on different days, number of accidents on various 
days, etc., all give rise to time series data, 

The time series data are playing increasingly significant role 
in all kinds of economic, business and commercial activities, It 
enables the economists and businessmen to know the changes that 
have taken place in the past and to compare the present activities 
with those for another period. It also enables them to know which 
products appear to be gaining or losing ground. It also helps them 
in predicting, to a certain limit, the future behaviour of the economic 
and business activities which is regarded most important to them 
before they proceed with their planning and budgeting. Modern 
statistical techniques enable us to know the factors responsible for 
the development in economic and business activities in the past 
and the way the same factors will operate in future. Once the 
regularity of occurrence of such factors over a sufficiently long period 
is established, the prediction of probable future variation, within 
certain limits, is possible. 


Characteristics of a Time Series. 

A time series data may best be studied by plotting them ona 
graph paper. When plotted it will be seen that most of the time 
series data show : 

(i) some movements exhibiting persistent growth or decline, 

(ii) some movements regular and periodic in nature with period 
not more than one year, 

(iii) some fairly regular and periodic with period of duration 
of more than a year, and finally, 
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(iy) same irregular, mild or violent movements. 


However, all the series may not show all the movements stated 
above and some series may show some type of movement not 
mentioned above. 


Thus a time series, in general, is the result of fowr types of 
movements. They are as follows : 


(1) A Secular (basic) Trend. 


This is a smooth, regular and long-term upward or downward 
movement in the data. It reveals the general tendency of the data. 
Almost all the time series show'a tendency to increase or to decrease 
gradually over a long period of time. The ‘increase may be due to 
growth of population together with improvement of purchasing power 
of people, advancement in technology, scientific management, personnel 
management, quality control and many other specifie factors. A time 

. Series may show # downward trend due to availability of better 
and cheaper substitute or difficulty in obtaining raw materials or 
decrease in the demand of the product, etc. Some series, however, 
may initially show a period of steady growth, reverse themselves 
and then show a period of decline and vice versa. 


(2). Seasonal Variation. 


This is a short-term periodic movement whose period is not: 
longer than a year. It is uniform and regular in nature. This short- 
term movement is mainly due to the climatic changes or to holidays or 
to social customs, trading and other habits of the people, etc. Most 
of the business activities have slack periods and brisk periods every 
year and these slack periods and brisk periods are mostly related to the 
changing seasons (weather) or to holidays or to social customs, trading 
and other habits of the people and repeat with remarkable regularity 
after a period of time which is not longer than a year. The upward 
and the downward movement of the data characterising the slack 
periods and brisk periods of business activity which is being repeated 
with remarkable regularity with a period of not more than a year 
is conveniently termed as seasonal variation. Sale of woollen goods 
increases during winter and falls during summer, cold drinks have 
a greater sale in the summer, sale of garments are heavy during 
Pujas, are all examples of seasonal variation with periods of 12 months. 
But in many cases the period may be weekly, quarterly or monthly. 
For example, the withdrawal from banks are heavy on the first day of 
the month, the traffic and telephone calls have a peak during certain 
hours of the day, the number of books borrowed from a library has 
peak periods during some days of a week, etc, 
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(3) Cyclical Variation. 


These are. oscillatory movements with a period more than 
a year. Such movements do not ordinarily exhibit regular periodicity. 
In economic and business series they correspond to the business 
cycle. Almost all business and economic activities have four distinct 
phases: Prosperity, Decline, Depression and» Recovery ; they cons- 
titute the business cycle and these four phases, recur after a period 
of time which is longer than a year with rough regularity. These — 
four phases are generated by factors other than changes in climate, 
social customs and those which create seasonal variations, Hach 
phase of the cycle changes into the phase which follows it. Prosperity 
is followed by Decline until Depression is reached. Then the 
Depression is followed by Recovery and back to Prosperity. This 
type of upward and downward movements characterising the period of 
prosperity and period of depression of business activity which is 
being repeated with rough regularity with period more than a year 
is termed as cyclical variation. 


(4) Irregular Variation. 


These variations are of two types—Catast. ophic and Accidental 
(residual). Catastrophic variations are due to specific events such 
ag strikes, fires, earthquakes, floods, etc. The accidental variations 
are due to multiplicity of causes of unknown origin. They are 
minor variations and too small to merit individual consideration. 
These variations may be of random nature. There are variations 
which are left after all other variations have been accounted for. 

The analysis of time series consists in separating the four 
constituent parts of the series, namely, the trend, the cyclical variation, 
the seasonal variation, and the irregular variations, analysing each 
constituent parts separately and then recombine the series in order 
to describe the observed variations in the variable of interest. 


The Classical Model. 


- In classical time series analysis it is assumed that there is 
either a multiplicative or an additive relationship between the four 
components of the series, that is, it is assumed that any particular 
yalue of the data is either the resultant -of the product of individual 
components or the resultant of the sum of the individual components. 
Symbolically, 
Y=TxCxsxtI (Multiplicative model) 
or Y=T+C+S+I (Additive model) 


Where Y is the original data, T is the trend component, § is the 
seasonal component, O is the cyclical component and I is the 
irregular component. : 
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The additive model, now-a-days, is not used in practice as 
it does not suit most of the economic time series. The multiplica- 
tive model is widely used as it portrays actual experience more 
closely than that of additive model. But the ultimate criterion for 
A given situation is to use the model that fits the data best. 


Measurement of Trend. 
Four methods are commonly used for measuring the trend : 
iy (1) Graphic Method ; 
(2) Semi-Average Method ; 
(3) Moving Average Method ; 
(4) Method of Least Square. 


(1) Graphic Method. 


In this method the series is plotted on a graph paper against 
time and a free-hand smooth curve is drawn by inspection between the 
points in such a way that the fluctuations in one direction are approxi- 
mately equal to those in the other direction. This curve will show 
the long-term general tendency of the data, that is, the trend. As the 
time series data are shown along the vertical axis, the vertical distance 
of this curve estimates the trend value for each time period. This 
method is very simple but highly subjective. Because of its sub- 
jectiveness this method should not in general be used by inexperienced 
persons. However, it has considerable merit in the hand of an 
experienced person and is widely used in applied situations. By this 
method a quick estimate of the trend—both linear and non-linear—is 
obtained, It is always desirable to draw a graph to obtain a prelimi- 
nary knowledge of the nature of trend in the time series data with a 
view to take a decision as to which type of mathematical trend will be 
appropriate. 


(2) Semi-Average Method. 


This method is applied when the trend is linear. In this method 
the whole time series data are divided into two equal parts and averages 
of each part are calculated. When the number of observations is even, - 
there will be exactly two equal parts but when the number of observa- 
tions is odd, the central value is generally neglected when the division 
is made. The two averages are then plotted at the mid-points of their 
respective time intervals and through these two points a straight line 
is drawn. This line will be the required trend line, the trend value 
for any time is obtained from the ordinate drawn at that point, 


This method can also be applied when the increase or decrease 
is by constant ratio. In this case, however, instead of original data 
their logarithms are used as the basis of the calculation. The trend 
values at any time will then be the antilogarithm of ‘the value of the 
ordinate at that point. ‘ 
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(3) Moving Average Method 


The simplest method of smoothing out fluctuations and obtaining 
the trend values with fair degree of accuracy is the moving average 
method. 

Moving averages are a number of arithmetic averages calculated 
from the time series data, each based ona fixed number of consecutive 
observations. If the time series data are yearly, then moving averages 
of period, say, » years are a series of arithmetic averages each of r 
consecutive observations. The firsi moving average is the average of 
first r observations and is placed at the time point midway between 
the time points of the first and the r-th observations of the series. 
For second moving average, drop the first observation and include the 
(y+1)th observation in the calculation of the average. The second 
moving average is the average of second to (r+1)th observations of the 
series and is placed at the middle of the period covering second to 
(r +1) years. Similarly, the third moving average is the average of 
third to (r+2)th observations of the series and is placed at the mid- 
point of the time interval covering third to (r +2) years, and so on. 


Since each moving average is placed at the time point midway 
between the time points of the first and the last observations included 
in the calculation of average, the moving average values do not 
correspond to any of the original periods when there are even number 
of periods, and hence two-item moving averages of the moving 
averages already obtained, have to be calculated to correspond them 
to any of the original periods. This process is called recentering. 


The purpose of the moving average method is to smooth out 
cyclical, seasonal and irregular variations of the time series data 
in order to isolate the trend. It is observed that moving average will 
completely eliminate e fluctuation if the period of moving average be 
equal to the period of the fluctuation or its integral multiple. 


When a series of yearly figures are given, the seasonal fluctua- 
tions, whose period is, generally, a year is automatically excluded from 
the series. The other fluctuation to be removed now is the cyclical 
fluctuation. If the period of the cyclical fluctuation is known, this 
can be eliminated by calculating moving averages taking the period of 
moving average equal to or an integral multiple of the period over 
which the cyclical fluctuations occur. When the period of cyclical 
fluctuation is not obvious then a graph of the actual data is to be 
drawn and the distance between two ‘peaks’ or two ‘depressions’ of the 
graph will be taken as period when the cycle is regular, and this period 
or an integral multiple of this period will be taken as the period of the 
moving average to smooth out the cyclical fluctuations. When the 
period of the cycle is not uniform, the average duration of the cycles or 
an integral multiple of it may be taken as the period. 


When the monthly or quarterly figures are given then a twelve 
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month or four quarter moving average is called for to smooth out 
seasonal fluctuations. Now if the cyclical fluctuation has period of, say, 
four years then the moving average with a period of 48 months or 16 
quarters will smooth out both the seasonal and the cyclical fluctua- 
tions. But if the period of the cyclical fluctuation be, say, 30 months 
then the moving average with a period of 60 months (the least common 
multiple of both the period of seasonal fluctuation and the period of 
cyclical fluctuation) is required to smooth out both the seasonal and 
the cyclical fluctuations. 

Moving average, in general, cannot eliminate irregular fluctua- 
tions but it only reduces them. 

Thus the moving averages with period same as the period of 
the cycle or its integral multiple will smooth out the seasonal and the 
cyclical fluctuations and give an estimate of the trend. 


The moving average values, sometimes, do not follow the data 
which describe curve unless some weighting schemes are used. The 
usual type of weighting is, however, binomial, which employ binomial 
coefficients as weights. There are other systems of weighting also. 
The main point in favour of weighted moving average is that they are 
both relatively smooth and sufficiently sensitive. 


MERITS AND DEMERITS 
(1) This method is flexible and not subjective. It is simple to 
understand and easy to adopt. 


(2) This method is appropriate only when the trend is linear. 
If the trend is not linear, the moving average will over- 
estimate or under-estimate the trend value. 

(3) Cyclical fluctuation may be. eradicated completely if the 
cycles are regular and the period of moving average be equal 
to or an integral multiple of the period of fluctuation. In all 
other cases moving averages will reduce them. 

(4) Trend values cannot be determined for some periods at the 
beginning and at the end. 

(5) The method cannot be used for forecasting future trend as 
the moving averages assume no definite mathematical law of 
change. 

(6) This method is very sensitive to a few very high and low 
values which the series may contain. 


EXAMPLE : 


Compute five-yearly moving averages from the following : 


Year : 1924 1925 1926 1927 1998 1999 1980 1931 1932 
Annual Sales: 64 43 43 34 44 54 34 94 14 
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SOLUTION : 


Calculations for 5-yearly Moving Averages 
a re a a en ee ee ra le 


Year Annual Sales 5-year Moving 5-year 
Total Moving Average 
1924 6'4 = ae 
1925 4'3 — a 
1926 4°3 228 4°56 
1927 3°4 21°8 4°36 
1928 4°4 20°9 4°18 
1929 54 19°0 3°80 
1930 34 160 3'20 
1931 24 _— — 
1932 14 - —_ 


—_———_—_———_————————_ 


[ Working Notes. 
First moving total =6°4+4'34+4°3+8'4+ 4°4=29'8 
Second moving total=4'3+4'3+3'4+4°4+5'4=21'8 
and so on. 


First moving average = ae =4'56 
Second moving average =u8 =4'36 


and so on, ] 


EXAMPLE : 
Calculate the four-yearly moving averages of the following : 


Year: 1951 1952 19538 1954 1955 1956 1957 1958 
Yi 506 620 1036 673 588 696 1116 738 
1959 1960 1961 1962 1963 1964 1965 
663 773 1189 818 745 845 1276 
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Caculation of 4-yearly. Moving Averages 
a ee 


Year i 4-year 4-year Q-item 4-year 
moving moving moving centred 
total average total moving 
| of col, (4) average 
(1) (2) (3) (4) (5) 6 
a 
1951 506 
1952 620 
1953 1036 beat : a 1438°00 719'00 
1954 673 2993 748'25 147750 738°75 
1955 588 3073 768'25 1516°50 75825 
1956 696 3138 784°50 1552°75 776'38 
1957 1116 3918 808'25 1587'75 793°87 
1958 738 3990 899°50 1625°75 812'87 
_ 1959 663 3363 840°75 1663°25 83167 
1960 173 3443 860°75 1701°50 850°75 
1961 1189 3595 881°25 1742°00 871'00 
1962 818 3597 g99'95 1780°50 89025 
1963 745 3684 921°00 1820°95 910°12 
1964 845 
1965 1976 
SSIS Ee a TEES NS ESO I Pa ea INS SEA 8 
ool, (4) = oot 8), ool, (6) =S1(6) 
{ Working Notes. 
First 4-year moving total =506+ 620+ 1086+ 673 = 9835 
Second ++ ++ + =620+1086+678+588= 9917 and'so on. 


First 9-item moving total=708'75-+729°25 = 1438'00 
Second ++ ++ + =729°954.748'95 = 1477°50 and go on, J 
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Calculation of 4-yearly Moving Averages 


Year X 4-year moving 2-item moving 4-year centred 

total total of col.(8) moving average = ool. (8) 
(1) (2) (3) (4) (5) 

1951 506 

1952 620 
2835 

1953 1036 5752 719°00 
2917 

1954 673 5910 738°75 
2993 

1955 588 6066 758'25 
8073 

1956 696 3138 6211 776'38 

1957 1116 6351 793°87 
3213 

1958 738 6503 812'87 
8290 

1959 663 6653 831'67 
3363 

1960 773 3443 6806 850°75 

1961 1189 3595 6968 871'00 

1962 818 3697 7122 890°25 

1963 745 3684 7281 910°12 

1964 845 

1965 1276 


Sega 


EXAMPLE : 
From the following series of observations, calculate 5-yearly 
weighted moving average with weights 1, 9, 2, 2, 1 respectively. 


en 
Year: 1969 1970 1971 1972 1973 1974 1975 1976 1977 


Annual sales 
(Bs. 0000): 2 6 1 5 8 7 2 6 4 
i ee nce ene Ecc EEA 


Year : 1978 1979 


Annual sales 
(Bs.’0000): 8 8 
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SoLvvIon : 
Compilation of 5-yearly Weighted Moving Average 


Year Annual sales 5-year weighted 5-year weighted 

(Rs. ’0000) moving total moving average 
(1) (2) (3) (4) 

1969 2 

1970 6 

1971 1 1.242.64+2.14+2.54+1.3=99 29 +8 =3'695 

1972 5 1.642.1+9.54+2.341,.7=31 31+8=3'875 

1973 3 1.14+2.54+2.34+2.7+1.9=33 83+8 = 4°195 

1974 7 1,5 +2.3+9.74+2.94+1.6=35 35+8=4'375 

1975 2 1342.7+9.24+29,.6+1.4=37 387+8=4'625 

1976 6 1.74+2.249.6+2.44+1.8=39 89+ 8 =4°875 

1977 4 1,.24+2.64+9.4+28+1.3=41 41+8=5'125 

1978 8 

1979 3 


en 


Total weights =1+2+2+2+1=8 


(4) Method of Least Square 


The method of least square is the most objective and widely 
-used method in determining the trend in a time series data. When 
the data are plotted on the graph Paper, it will be seen that all. the 
points will not lie on a curve and quite a large number of curves can be 
drawn, by inspection, between the points. In order to find the best 
fitting curve to the data the method of least Square is followed. This 
method consists in finding the best fitting curve to the time. series 
data as that curve, from all possible curves, for which (i) the sum 
of the vertical deviations of the actual (observed) values from the 
fitted curve is zero and (ii) the sum of the squared vertical deviations 
is minimum, that is, no other eurye would have a smaller sum of 
squared deviations. A graphical representation of the date is required 
Ma oun a decision to be made as to the particular curve to be 

ied. 


I. Liyzar TREND 


When the trend is linear the trend equation may be represented 
by y=atbt and the values of a and } for the line y=a+bt which 
minimizes the sum of squares of the Vertical deviations of the 
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actual (observed) values from the straight line, are the solutions to 
the so-called normal equations : 


sy =na +b St se (1) 
Syt=adtt+d st? > (2) 
where 7» is the number of paired observations. 


The normal equations are obtained by multiplying y=at+ bt, by the 
coefficients of a and , i.e., by 1 and ¢ throughout and summing up. ~ 


Case I. When the number of years is odd. 


When the number of years is odd the calculation will. be 
simplified by taking the mid-year as origin and one year as unit and 
in that case S¢=0 and the two normal equations take the form 


sy=na 
Syt=bst* 
>> Dut 
and hence, a= = bao 


Case II. When the number of years is even. 


When the number of years is even the origin is placed in the 
midway between the two middle years and the unit is taken to be 
$ year instead of one year With this change of origin and scale we 


have again S¢=0 and hence =>! and b= 2h. 


EXAMPLE : 
Fit a straight line trend to the following data : 


Year +1965 1966 1967 1968 1969 1970 1971 


Gross ex-factory 
value (Rs. crores): 672 824 967 1204 1464 1758 2057 


and estimate the Gross ex-factory value (Rs. crores) for the year 1975. 
[I. 0. W. A. Dec. 1975 ] 


SOLUTION : 


Let the straight line trend is represented by the equation 
y=a+bt. The values of a and b shall be determined by’ solving the 
normal equations Sy=na+bxt and Syt— aSt+b3zi’. 


Here, since the number of years is odd the mid-year, 7.¢e., year 
1968 is taken as origin and one year as unit. 
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Gross ex-factory 


Year value (Rs. crores) (y) t=Year—1968 i? yt 
1965 672 =3 9 — 2016 
1966 824 ~2 4 — 1648 
1967 967 =i 1 — 967 
1968 1204 0 0 0 
1969 1464 1 1 1464 
1970 1758 2 4 3516 
1971 2057 3 9 6171 


3t=0, St" =98, Syt= 6520, n=7, sy =8946 
From normal equations, 
8946=7a+bx0 or, 8946=7a or, a=1978 
6520=ax0+b% 28 or, 6520=28p ‘or, b=232'9 
*. the trend equation is 
y =1278 +239'9 ¢ with origin at 1968 and ¢ unit 1 year. 


The value of ¢ for 1975 will be 7. Hence the estimate for the 
year 1976 is 


Y= 1278 + 239'9 x 7=1978 + 1630'3 =2908"3 (Rs. crores). 


EXAMPLE ; 
Fit a straight line trend to the following data : 


Year 1951 1952 1953 1954 1955 1956 
Hlectricity generated : 
(million kw hours) 101. 107 «118-191 ~~ 436 148 


(1. 0. W. A., Jan. 1967 ] 
SoLvTIoN ; 


Let the straight line trend be represented by y=a+bt. Here 
the number of years is even, 60 we take the origin at the mid-point 
of 1953 and 1954 and ¢ unit=4 year. 
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Fitting of Straight Line Trend 
arta ene eet 


Electricity _ Year —1953'6 2 

eat generated (y) ; 2 i yt 

1951 101 -5 25 — 505 
1952 107 =—3 9 -321 
1953 113 cot 4 1 -113 
1954 121 1 iF 121 
1955 136 3 9 408 
1956 148 5 25 740 
Total 726 0 70 330 


Normal equations a 
Sy=an+byt or, 726=6a+bx0 or, 726=6a or, a=121 
syt=ast+bst? or, 330=a%0+5%70 or, 8830=70b or, b= 4°71 
the trend equation is 


y=121+4'71¢ with origin at the mid-point of 1953 and 
1954 and ¢ unit=4 year. 


Ul, Frrrine 4 PARABOLIC TREND 

For a parabolic trend the equation is of the form 

y=atbt+ct? 

The normal equations to find the constants a, b and c are given 
by 

Sy=natbstt crt? 
Dyt=adt+ byt* +cxt® 
Syt* =axt? +bEt® +oxt* 

Taking the mid-point of the period as origin and unit a8 one year 
when n is odd and 4 year when n is even, we have 3t=Oand 3t°=0 
and the normal equations reduces to 

Sy=natcst* 
syt=bxt? 
Syl? ast? +coxt* 
which can be solved easily for the three constants a, b and c. 
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EXAMPLE : 
Fit by the method of least squares a parabolic curve to the 
following data : 


Year : 1971 1972 1973 1974 1975 1976 1977 
Production in tons ; 23 20 18 18 14 13 13 


ee eg ae 


SOLUTION : 


Let the parabolic curve (trend) be represented by y=a+ bi + ct?. 
Since the number of years is odd the origin is taken at the mid-year, Ory 
1974 and unit as one year, 


Fitting of Parabolic Trend 


Oe 


Production t=Year 2 8 4 2 
Year (v) —1974 t t t yt yt 
a OS REA ts 


1971 93 -3 9 -27 81 -69 207 
1972 20 -2 4  -8 16 -40 ° 80 
1973 18 -1 Tenet 18: 18 
1974 18 0 0 (AX) 0 0 
1975 14 1 1 1 14° 14 
1976 18 2 4 Beets abo =" 59 
1977 13 3 9 Die Oli. 892-117 
Total 119 0 98 0 196 -48 488 
From the normal equations 

Sy=ant b3t+ost® #9 10} 

Syt=adi+ byt? +ext® t=) (ii) 

Sut? =a50? + bst* + osi* ++ (iii) 


Now, putting the values of St=0, 512= 28, Di°=0, st*=196, 
Zyt= —48 and Syt*? = 488, we get, 
119 = 7a + 28¢ 5 + (iy) 
-48=28b eG) 
488 = 98a +196¢ ee 
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Solving the three equations, 


a= 16°48, 
b=- 171, 
c= 0143. 


Thus, 
y=16'43—1°71 ¢+0°143 ¢? 
with origin 1974 and ¢ unit one year.. 


III. OrnEer TyPEs of TREND CURVES 


(i) The following curve is often found to be a good fit to the 
time series data which is reducible to the linear form 


y=a.b' (exponential curve) 


which is reducible to the linear form by taking logarithm on 
both sides. 


Thus, log y=log a tt. log b. 


A straight line trend to log y and ¢ may be fitted and obtain the 
least square estimates of log a and log b and hence a and b. 


(ii) In special cases more complicated curves are found to be a. 
good fit to the time series data. 


They are 


Gi) oy -o (Logistic curve) 


(Gi) y=a.b* (Gompertz curve) 
(ii) y=a+b.ct (modified exponential curve). 


MERITS AND DEMERITS. 


(i) The mathematical curves fitted to the data js most suitable for 
forecasting purposes. 

(ii) Being defined by a mathematical equation the method is most 
objective. There is no scope of any subjectiveness in this 
method. 


(iii) This method involves more calculations as compared to other 
methods. 


(iy) It is not flexible in the sense that an addition of some more 
values for some corresponding additional years changes the 
entire calculations of the trend equation. 
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Measurement of Seasonal Variations. 


Methods used in measuring Seasonal Variations are as follows : 
(1) Method of Averages. 
(2) Ratio to Trend Method. 
(8) Moving Average Method. 
(4) Link Relative Method. 


(1) Method of Averages 


This method, the most simplest one, ignores any trend or any 
cyclical fluctuations that may be present in the series. It is applied 
only when the series does not contain trend or cyclical fluctuations to 
any appreciable extent and has stable, unchanging seasonal fluctuations, 
To find the seasonal Variations, arrange the data by years and months, 
if monthly data are given, and obtain the total of the values for each 
month. Calculate the average value for all Januaries, all Februaries, 
etc. and then average these averages. If multiplicative model is used, 
then the percentage of each monthly average to average of monthly 
Averages will be the seasonal indices. Sometimes slight adjustment 
is necessary to make the total indices 1200, as in this case 100 percent 
is considered to be the seasonal value for a normal month. 


Instead of monthly figures if quarterly, weekly, ete, figures are 
given the same procedure explained above will be followed, 


Since most of the time series data contains trend and cyclical 
fluctuations, this method is rarely used. 


When additive model js used, the deviations of the average 
of monthly averages from each monthly averages will be the seasonal 


EXAMPLE : 
Quarterly sales in (Rs, 000) of a company is given below : 


Quarters Years \ 
1976 1977 1978 
I es T2 T4 84 
II 50 6'8 60 
TIt 78 74 62 
Iv 9'2 90 76 


Calculate the seasonal indices, 
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SOLUTION : 
As no appreciable trend is noticed in the given data method 
of quarterly averages to be used here. 


Calculation for Seasonal Indices 


Quarter Year Total Average 
1976 1977 1978 
I 72 v4 84 23'0 767 
II 50 68 60 178 5°93 
Il 78 74 62 21°4 713 
IV 9°2 9°0 76 25°8 8°60 
Total 29°33 
29°33 _ 7°33 


Average of averages = ri 


Seasonal Index for Quarter I= a x 100=104'°6 


» om mm r= 898 x 100=80'9 
» 9 om om t= 7 x100=97'2 
5 id 2 ld TV =) x 100-1178 
Note. Since the total of Seasonal Index for the 4 quarters is 400, no adjustment 
is necessary. 
EXAMPLE : 


Compute the average seasonal movements by method of average 
for the following data : 


Year Quarters 

I Il Til IV 
1970 30 81 80 33 
1971 34 27 18 24 
1972 29 30 28 34 
1978 31 29 95 30 
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Calculation for Average Seasonal Movement 


eee 


Year Quarters Total 
I Il In IV 

OE eee 

1970 30 31 30 33 _ 

1971 34 Q7 18 24 — 

1972 29 30 28 34 _ 

1973 31 29 25 30 ire 
SE Ere 

Total 124 117 101 121 463 

Average 310 29°25 25°25 30°25 115°75 

Average Seasonal 

Movements 2°06 0°31 -3'°69 1°32 0 


Average of averages of four quarters = nee = 28°94, 


{ Working Notes : 


Average seasonal movements 
for quarter I =81°0 —98'94=9'06 
for quarter IZ = 29°25-98'94= 0°31 
for quarter IIT =95°25—98'94 = —3°69 
for quarter IV =30'25—98'94 =1°31) 


Note: 1°81 has been arbitrarily changed to 1°82 to make the sum of the 


Seasonal variations 0, 


42) Ratio to Trend Method. 


Since most of the economic time Series data contains trend to 
‘an appreciable extent, the method of averages is to be used after 
eliminating the trend from the time series data. 


This method involves estimation of the trend, elimination of 
‘the trend from the given series and then to use the method of average 
‘to obtain the seasonal indices. This is done as follows : 


Estimate the trend by fitting a mathematical curve and elimi- 
nate the trend from the given series by expressing the original data 
a8 percentage of the Corresponding trend values. Arrange these 
Percentages by years and months if monthly figures are given and 
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obtain the average for each month and then average these twelve 
monthly averages. The percentages of each monthly ayerage to the 
average of monthly averages will be the seasonal indices. 


Same procedure to be followed when quarterly or weekly, etc. 
figures are available. 


(8) Moving Average Method. 


This method consists in eliminating the trend and cyclical 
components of the time series data and then to use the method of 
average to obtain an estimate of seasonal indices. 


APPROACH FOR THIS METHOD : 


Hliminate seasonal variations by taking a centred moving average 
with a period of 12 months if monthly data are involved. This 
moying average will also largely smooth out irregular movements, 
Tf, now, the multiplicative model is used then ratios to moving averages 
expressed as percentages, that is, original observations expressed as 
percentages of the corresponding 12 month moving averages will contain 
only the seasonal variations and irregular variations which has been 
eliminated by the moving average process. These percentages are then 
arranged by months and the average for each month is calculated. 
The seasonal indices are then found by expressing these monthly 
averages as percentage of the average of the 12 monthly averages, so 
that the sum of these indices is 1200. 


Note. 1. If additive model is used then deviations in place of ratios are to be 
taken, 


Note 2. If quarterly, weekly, etc. figures are given then also the same 
procedure will be followed. 


EXAMPLE : 


Using additive model compute seasonal variations of the 
following by the method of moving average and obtain deseasonalised 
data for the four quarters of 1973, 


Quarterly Output of Paper in million tons 
_—<——— i ___ 


Years I It Til IV 
1971 37 38 37 40 
1972 41 34 25 31 
1973 35 37 35 41 
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SoLuTION : 


Since quarterly figures are given a centred moving average with 
a pericd of four quarters is necessary. 


Calculation of Moving Averages and Deviations 


Years Quarterly 4-quarterly § 2-item 4-quarterly Deviations 


and output moying centred centred (million 
Quarters (million total moving moving tons) 
tons) total average 
(1) @) (3) @=8 — (=0)-@) 
1971. I 37 
it 38 
III 87 a 308 «885 =1'5 
IV 40 308 38°5 15 
1972 I 41 tbe 992 36°5 4°5 
I 34 al Q71 33°9 O'1 
mI 25 wile 256 32°0 -70 
IV 31 2653 316 -06 
1973 I 35 1 266 33°2 18 
II 37 ¥3 986 35°7 13 
ur 35 


Calculation of Seasonal Variations 
pecans cial MENT eR tide jeieidng teeter: |] 


Years/Quarters Deviations (million tons) Total 
I II TIE IV 
1971 = — ety 15 = 
1972 45 O1 -70 -0°6 gran 
1973 18 13 — = a 
Total 63 14 =8'5 09 er 
Se ie etn nln ane opti een mengerearnnenne a 
Average 3°15 o7 -4°25 0°45 0°05 
OTe is SS yah Te pe a 
Adjustment —0°0125 -0°0125 -0'0125 -0'0125 -0'05 
Pee ieigee La bak Ne it. Re eae Air ORE Seba je 
Seasonal 


Variations 3°1375 0°6875 —4°2625  0°4375 0 
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(Adjustment = -°7— - 0-0195) 
Deseasonalised or for 1978 : 
For first quarter 35 — 3'1375 = 318625 
For second quarter 37-—0°'6875=36'3125 
For third quarter 35 + 4°2625 = 39°2625 
For fourth quarter 41-—0°4875=40°5625 ‘ 


EXAMPLE : 


Compute Seasonal Indices of the following by the use of ratio-to- 
moving-average method. 


Production of Paper in (thousand tons) 


Years/Quarters I Ram Ill IV 
“1951 120 118 193 194 
1952 124 122 126 129 
1953 128 125 131 ’ 184 
1954 132 129 135 138 
———————S—— ee eee 
SOLUTION :° 


Nee eee nec ee cn 
Vides and Nag 4-quarter 4-quarter Moving average Ratio to 


Quarters moving total pated of Wena ee rat 
19511 120 
1 118 ; 
TIT 122 pe aod 191°50 100°4 
IV 124 499 198°00 122°50 101°2 
1952 1 124 i 193°50 —-100°4 
496 —-194'00. 
PO crs cohll asses 124°68 97°9 
BO cise ie oe aes, 195°75 100°1 
Boao insti id ee 196°62 101'8 
Tre ed sees 197°69 100'2 
ae eSpace a 198°88 97°5 
2 TRIE Babi 130°00 100°8 
Nig ela ae lea 13100 -108°3 
Iie ee eam 132°00 100°0 
tee ag °° i 133°00 96'9 
Fits yagek 2894S 5, A88°50, 
IV. 138 
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° Calculation of Seasonal Indices 
eee 
Ratio to moving average x 100 


Years/Quarters I II Til IV Total 
1951 my + | 1004) 1012 — 
1952 100% 97:9 100% 101'8 -- 
1958 1002 975 1008 102°3 _ 
1954 : 100°0 - 96°9 at — > 
Total 800°6 . 2923  801°3 —-305'8 — 

“AM. 1002 974 1004 1018 3998 


Seasonal Indices 100°25 97°44 100°45 - 101'86 400°0 


{ Working Notes : 
Average of A.M’s = 200°8 _ 99-95, 
Seasonal Indices ; : 

100'2 
For first quarter= 99°95 *100=100°25 
974 


For second quarter = 99°95 


% 1000=97°44 


y and so on, 
The sum of Seasonal Indices for four quarters must be 400 ] 


CONVERSION oF ANNUAL TREND TO MONTHLY VALUES: 


In time series annual data are usually employed to compute the { 
trend. It is sometime necessary to obtain monthly trend values 
instead of yearly values from the trend line, 


The annual data employed may refer to annual totals or may. 
refer to monthly averages for each year obtained from annual totals by 
dividing each total for a year by 12, 


(i) When Annual Data are Annual Totals : 


Let y=a+bt be the least square trend fitted to the annual totals 
for some years. 


Taking the mid-year as origin and one year as unit when n, the 


number of years covered, is odd we have a =2u which is the arithmetic 


mean of m yearly totals. This value of a for annual totals when — 
~ taken in monthly terms would be #yth of it. From annual data the term : 
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b is the trend increment for entire year. Dividing this b by 12 we 
have monthly trend increment in the yearly totals. Since still we 
have yearly totals b should be divided again by 12 to reduce it in 


monthly terms. 
Thus the ei equation would be 


pb? 
un at [44 t=ct+di 


(origin: mid-year, i.¢., June-July ; ¢ unit=1 month) 

Since the number of months in a year is even, the origin is in the 
middle of two months. Thus ¢ is the value of trend at the end of 
June, Since monthly trend values should refer to the middle of the 
month, the origin should be shifted from the middle of the two months 
to the mid-point of any convenient month. If middle of July is taken 
as origin the trend equation for monthly values will be 


penta 


“ia? a (+4) 
(origin: July ; ¢ unit=1 month) 
by shifting the origin half month later. 


Again when the number of years coyered is even, as cxpintied 
earlier the ¢ unit will be of 6 months and hence the monthly trend 


increment here will be : as bis the trend increment for 6 months. 


Hence proceeding in the same way as just described except for the fact 
that instead of dividing b by 144 it is to be divided now by 6% 12—=79, 


Hence the monthly trend equation is 
a,b 
“UI9* 79 
(origin: Dec.-January ; ¢ unij=1 month) 
Now if January is taken as origin then monthly trend equation 
becomes 


U5 * 75 tH) 


3 (origin: January ; ¢ unit =1 month) 
by shifting the origin half month later. 


(ii) When Annual Data are Monthly Averages : 


When the trend has been fitted to the annual data which are 
monthly averages, it is simply required to divide b,. the annus} 
trend increment, by 12 when the number of years covered: is odd and to 
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divide b by 6 when the number of years covered is even and then shift 
the origin to the middle of any convenient month in the same way 98 
described in (i). 7 
Thus the monthly trend equation when the number of years 
covered is odd, is 
b 
y=aty 9 (¢+4) 
(origin : July ; ¢ unit=1 month) 
and monthly trend equation when the number of years covered is 
even is, 
yrat & (+4) 
(origin : January ; ¢ unit=1 month) 


EXAMPLE : 


Find the Seasonal Indices from the following data by the ratio 
to trend method : 
pa SG AE AB BS A 


Years => Qtr. I Qtr. II Qtr. IIT Qtr. IV 
OUELISIUES RU Mries Werte sr ee nen et ee 
1971 45 60 54 51 
1972 51 78 15 66 
1973 60 87 Sit ni, 72 
1974 81 114 102 93 
1975 120 138 129 123 
SS 

SoLurion : 

‘Calculation of Trend by Method of Least Square 

Yearl: tor: = , d 
Years totale, Paategh : aber 3, yt ° se 
1971 210 52°5 =2-=105'0: =) 4 48 
1972 270 675 +1. +675. 4 66 
1973 300 75°0 0 0 0 84 
1974 390 975! 1 975 1 102 
1975 - 510° 1975 2 2550 4 120 


sy =420'0, Syt=180'00, st7=10, n=5, st=o 
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Let y=atbt be the equation of straight line trend, then the 
normal equations are given by, 
Sy=nat byt 
Syt=astt byt? 


which gives, 
490=5at+bx0 } be 5a = 420 } oe a=84 
180 =a x0+10b * 10b=180 b=18 


The yearly trend increment=18 
. quarterly trend increment =i8=4'5. 

Calculation of Quarterly Trend Values: ‘The straight line trend 
has been fitted to the annual date which are quarterly averages, So, 
for the year, say 1971, the trend value for the middle quarter is 48, 
that is, the trend value for half of second quarter and half of third 
quarter is 48. Hence the trend value for the second quarter is 
48-4 4'5=48-2'25=45'75 and for third quarter jis 48+4x45 
=60'25. The trend value for first quarter is 45°75—-4'5=41'25 
and for third quarter 50°25 + 4°56 = 54°75, and so on. 


i } ie Trend Values 
Year Qtr. I Qtr. IL Qtr. III Qtr. IV 
1971 41°25 45°75 50°25 54°75 
1972 59°25 63°75 68°25 72°75 
1973 77°25 81°75 86°25 90°75 
1974 95°25 99°75 104°25 ; 108°75 
1975, 113°25 11775 129°25 126°75. 


 ————————————— 
Calculation. of Seasonal Indices 


Ratio to trend expressed as % 


Year Qir.I° © Qtr. II Qtr. IIL Qtr. IV Total 
1951 109°1 181°2 107°6 93°2 — 
1952 860 122°3 109°8 90°6 — 
1953 776 106'4. - 93°9 19°38. — 
1954 85'1 114°3 97'8 85°4 _ 
1955 106'1 1172 105°5 971 ae 
Total 463°9 591°4 514°6 , 445°6 — 
cise pias ata Bae glee tad bat oiinnd bole bee 
AM. 92°78 118°28 102°92 89'12 403°10 
Ce NR SOOPER BLY 5 8 EB see a bt aay cy oleae ao 
Seasonal H ob? x ; 
Tniieee 92'1 1173 102°2 88°4 400°00 


ed 
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[ Average of A.M.’s= suz1 =100°775 


: 92°78 ¥ 
Seasonal Indices, for first quarter = 500775 * 100 = 992°1 


118'28 ‘ 
for second quarter =300°775 *100=1173 


and so on. | 


(4) Link Relative Method. 


This method is the most difficult one and based upon the averages — 
of Link Relatives. Briefly, the method is as follows : 


(1) Calculate the link relatives by expressing the value for a 
month, if monthly figures are given, as percentage of the immediately 
preceding month’s value. The link relative for the first month 
of the first year can not be obtained, 


(2) Arrange the link relatives by months and calculate the 
average link relatives for each month using preferably the median, 
The arithmetic mean can also be used to calculate the averages. Leb 
these averages be Ay, Ag, ...... Wasa: 


(8) Convert these averages into Chain Relatives by relating them 
to a common base, that is the first month January. The chain relative 
for this January is taken as 100 per cent and the chain relative for any 
month is now obtained by multiplying link relatives of that month 
by chain relative of Previous month and dividing the product by 100, — 
Let the chain relatives be Ossi Og ize -dive + Cia, Cis—where Cys is the 
second chain relative for January. 4 


(4) There will be some difference between c, and c,5 and that 
difference is due to the presence of Trend, It is, therefore, necessary 
to adjust for trend. For this, the difference c13—c1 is considered to 
be the annual trend increment. So, the monthly trend increment is 
Cis—c 

12 
is deducted from the chain relatives Cannes » Cig respectively to get 
adjusted (corrected) chain relatives. 


(5) These adjusted (corrected) chain relatives are then expressed — 
as percentages of their average to provide the required seasonal indices. ] 
The sum of the seasonal indices will be 1200. 


*=c (say). This monthly increment'c multiplied by 1, 2, 3, ..., 11 


Note. If quarterly, weekly, etc, figures are given the same procedure ex- 
plained above.to’be followed. 
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EXAMPLE : 


Calculate seasonal indices by the method of link relatives from 
the following data 3 


Production in tons 


Years 1st Qtr. Qnd Qtr. 3rd Qtr. 4th Qtr. 

1976 360 364 388 380 

NG aan 364 380 398 388 

1978 366 412 436 414 

1979 366 416 450 440 
Sonvurion : 


Seasonal Indices by Method of Link Relatives 


Link Relatives 
Years/Quarters : 
I II Tit IVv a 
1976 _ : 1011 "1066 97°9 : —_ i 
1977 95°8 104°4 | 104°7 97'5 _— 
1978 94°3 112°6 1058 94'9 —_ 
1979 88°4 1137 108°2 978 —_ 
Total 278'5 4318 425°3 388'1 — 
AM. 92°83 | 10795 | 10632 | 97°02 = 
OR. 100 107'95 477 111°35 | 103°36 
Adjusted O.R. 100 10711 | 113°09 | 108°83 100 
Sail 932 | 999 | 1054 | 1015 | — 
Indices 
Steps : : 


(i) LR: 364+360=101'1, 388+364=106'6, etc. 
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(ii) OR: for 1st Qtr.=100 (assumed) ; 
_ 100 x 107'95 _ joke 
for 2nd Qtr. “499 = 207'85 ; 
106°32 x 107°95 eee 
for 3rd Qtr. ae TqQO aT 114°77 ; 
for 4th Qtr, = M4 TTX 9702 17°35 ; 


for 1st Qtr. (Second O.R.) = nse oa8s =103°36. 


(iii) Adjustment (correction) factor = 10836= 100 =0'84 


Adjusted (corrected) O.R. : for 1st Qtr. = 100 ; 
for 2nd Qtr.=107'95 — 0°84 =107'11; 
for 8rd Qtr. =114'77- 1°68 = 113'09; 
for 4th Qtr. =111°35— 2'52=108'83 ; 
for 1st Qtr. (Second C.R.)=103'36 
— 3°36 =100. 


(iv) A.M. of adjusted O.R.'s= 100 + 10711 +113'09 + 108'83 
»M. RR. ri 


=107'26. 
Seasonal Indices : for 1st Qir. eee * 100 = 93°92 ; 
*107'26 i 


107°11 
107°26 
11309 
107°26 
108'83 
for 4th Qtr. =Torae * 100=101'5. 


Total of seasonal indices should be 400. 


for 2nd Qtr. = 


*100=99'9 ; 


for 3rd Qtr. = * 100 =105'4 ; 


Note, The averaging process used in all the methods explained above is only to. 
Smooth out the irregular variations as far as possible, 


Determination of Cyclical Variations, 
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irregular variations. However, the trend may be removed first from 
the data, giving the trend adjusted data and, next, seasonal variations 
may be eliminated, leaving only the cyclical-irregular variations. 
Another possibility is to obtain the product, for multiplicative model 
(or sum, for additive model) of the trend and seasonal values and to 
eliminate both of these movements at the same time, from the time 
series data, leaving again only the cyclical-irregular variations. 

The cyclical-irregular movements thus obtained may further be 
smoothed by the method of moving average in order to obtain cyclical 
variations. The irregular variations, in general, cannot be eliminated 
completely but can be smoothed, so as to bring a better picture of the 
cyclical variations by the use of short-term weighted moving averages. 


' Miscellaneous Examples 


1. Calculate a five-year moving average of production data given 
below : 


Year : 1929 1930 1931 1932 1933 1934 1935 1936 1937 1938 
Produgtion : 1590 1516 1864 1184 1063 1802 1428 1773 1917 1990 

(6. U. 1966 ) 
SOLUTION : 


Compilation of Five-year Moving Average 
enn 


Years Production eed ae ae 
SU slnenmD se Dug nnerseia aw Te SUIS A SA 

1929 1590 nis nae 
1930 1516 = m3 
1931 1364 6667 1333'4 
1932 1134 6879 1275'8 
1933 1063. 6291 12582 
19384 1302 6700 1340'0 
1985 1498 7483 14967 
1936 1773 8410 1682'0 
1937 1917 = — 
1938 1990 = — 


Laces EEE 

2. The following data give daily sales of a shop observing a five- 

day week, over four successive weeks. Determine the period of the 
moving average and calculate the moving averages accordingly. 
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(0. A. 1974) 


Sonurion : 


The data show a regular cycle of 5 days and hence the period of” 
moving average should be of 5-day. 


Calculations for 5-day Moving Average 


Lee 


Dav | Baten. 9, S404 moving dan mai 
TRESS WR a tage or vee one cee q 
1 26 _ —a 
2 29 hy ye 
3 35 188 376 
4 47 188 376 
5 51 191 88'2 
6 26 193 38'6 
7 32 192 38°4 
8 37 194 88'8 
9 46 196 39'2 
10 53 194 38'8 
11 28 193 38°6 
12 30 193 38'6 
13 36 194 38'8 
14 46 194 38'8 
15 54 195 89'0 
16 98 195 39°0 
17 31 195 39°0 
18 36 195 39°0 
19 46 = sk 
20 54 = 
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3. Assuming a four-yearly cycle, calculate the trend by the 
method of moying average from the following data : 


Year : 1941 1942 1948 1944 1945 1946 1947 1948 1949 1950 


Production: 464 515 518 467 502 540 6557 571 586 612 
(I. C. W. A., 1968 ) 


SOLUTION : 

Here the cycle is of a period of 4 years. So, to obtain the trend 
values by moving average method, the period of moving average must 
be 4 years, 

As the period of moving average is even, it is necessary to centre 
the moving averages. 


Calculation of Trend Values by Moving Average 


4 H 4-year moving 

Year Production 4: thi moving ake age average (cen : ed) 

) (2) (8) (4) eee 
1941 464 
1942 515 1964 yy 
1943 618 9009 3966 495'8 
1944 467 9027 4029 503°6 
1945 502 2066 4098 511°6 
1946 540 9170 4236 529°5 
1947 557 9954 4494 553°0 
1948 571 9396 4580 5725 
1949 586 
1950 > 612 


RLS 
4. Given below the production of coal in thousands of tons for 
> the years 1971—75 y 
Year POV Pe e972 1973 1974 1975 
Production: 44°5 38'9 38°1 32°6 38°7 
Use the method of least squares to fit a line to the data given 


aboye, What is the trend value in the year 1978 ? 
{I.C. W. A., 1979] 
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Sonvution : 


Taking the origin at the year 1973 and unit of time ¢=1 year, 
let y=a+bt be the equation of the trend line, where y denotes the 
production in thousands of tons. The constants a and b are to be 
determined by solving thé normal equations, given by, 


Sy=a.n+d3St; Syt=adi+byt* sar (1) 
Calculations for Linear Trend ie 

Year t y ? ut 

1971 -2 445 4 -890 
1972 ire 38°9 1 - 38'9 
1978 0 381 0 0 

1974 1 82°6 1 32°6 
1975 2 38'7 4 774 


[SEL ec ial eda SE a Si el Sot ctl all 
st=0 ; Sy=192'8 ; St7=10; Syt=—-17'9; n=5. 


Solving the normal equations : 


Sy=an+b3st Syt=adt+bst? 
192°8 =a.5+b.0 -179=a.0+6.10 
or, 192'8=5a or, —17'9=10b~ 
or; a=88'56 or, b=-1'79. 


«', the equation for trend is y=88'56— 1°79! with origin at 197% 
and ¢ units=1 year. 


‘To find the trend value in the year 1973, put ¢=0 in the above 
equation, 7.¢., 1973 = 38'56—1'79 x 0 = 88°56 thousand tons. 


5. The weights (in lbs.) of a new-born calf are taken at weekly 
intervals. Balow are the observations for 10 weeks : 


ge (iio Loy Bem Oniae AAG BO Tae 98 7.) 10 
Weight (y): 525 58°7 650 702 754 811 87°2 95°5 109°2 108°4 


Let y=a+t bu, where w=2r¢—11. Use normal equations to 
estimate a and b, (Given : The sum of the products of the form uy for 
these 10 observations=1016'8), Hence obtain the line of best fit of 7 
ong. Now, write down the average rate of growth of the calf per 
Week. [I. ©. W. A., 1979} 
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SoLvrion : 
The normal equations for determining the constants a and b are 
Sy=nitbdu (i) 
Syu=adut+bsu? + (ii) 


Calculation for Trend Equation 


© y w=2e-11 wu 
1 52'5 -9 81 
2 58°7 alt 49 
3 65'0 -5 25 
4 70'2 =8 9 
5 75°4 sail 1 
6 811 1 1 
7 872 3 9 
8 955 5 25 
9 102'2 ‘6 49 
10 108°4 - 9 81 
Totol 7962 0 830 


Number of observations=n=10. Also, given Duy = 10168 
Putting the values from the table in the normal equations, 
796'2=10a+b.0 } & 10a = 796'2 
1016'8 = a,0 + 3305 3306 = 1016'8 
Solving, a= 79°62 ; b=3'08. 
the line of best fit is then, 
y = 79°62 + 3°08u 
=79'62 + 3°08 (22-11) 
= 45°74 + 6162. 
The average rate of growth of the calf per week is 6'16 Lbs. 
6. Fit a straight line trend by the method of least squares and 
estimate the trend values : 
Year : 1961 1962 1963 1964 1965 1966 1967 1968 


Value: 80 90 92 «83 94 99 92 104 
Y [C. A., 1976] 


366 _ BUSINESS STATISTICS 


SOLUTION : 


| Let y=a+bt be the equation of Straight line trend with origin! at 
the mid-point of 1964 and 1965, and unit of time i= 4 year. 


The normal equations for finding the values of a and } are 
Sy=ant+bst 
Syt=adt+ bst?, 


Calculations for Straight Line Trend 


Year Value (y) t on yt 
1961 80 =7 49 ~-560 
1962 90 -5 95 — 450 
1963 92 -3 9 — 276 
1964 83 seh 1 — 83 
1965 94 1 1 94 
1966 99 8 9 297 
1967 92 5 95 460 
1968 104 7 49 728 
Total 734 0 168 210 


Putting the yalues from the table in the normal equations and 
noting that »=8, 


784=8a+b.0; 210 = 4.0 + 1680. 
or, 8a=734; or, 168b=210 
or, a=91°75; or; b=1'95 


The trend equation is then, 
y=91°75 + 1'25¢ ; 
with origin at the mid-point of 1964 and 1965 and ¢ units =t year. 


Calculation of Trend Values 


Trend yalues for 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968 
fre obtained by substituting the Corresponding values of t, viz., 
hh $5; —8)~ 11,18, 55%, in the trend equation. 


Trend value for 1961=91'75 + 1°25 x (~7)~83°0 
1962 = 91°75 + 1°25 x(—5)=85'5 
1963 = 91°75 + 1°25 x (—3)=88'0 
1964=91'75 + 1°25 x (—1)=90°5 
1965=91'75+1'95x1 =93°0 
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1966=91'754+125x3 =95'5 
1967=91°75+125x5 =98'0 
1968=91°75+125x7 =100°5. 
7. Calculate the Seasonal Indices from the poue wane data using 
the average method : 
aS eI SS EPIRA SES SL CSSA Re doe 
Years ist Quarter 2nd Quarter 38rd Quarter 4th Quarter 


1974 72 68 80 70 
1975 76 70 82 74 
1976 74 66 84 80 
1977 76 74 84 78 
1978 78 74 86 82 
Pa EE RS 
(0. A., 1979) 
SOLUTION : 


Calculation of Seasonal Indices 
Pe Be BEE BA SR RTE SOE SIE LGR ERSTE ES MO Oe 
Years 1st Quarter 2nd Quarter 38rd Quarter 4th Quarter 


1974 72 68 80 70 
1975 76 70 82 14 
1976 14 66 84 80 
1977 16 74, 84 78 
1978 78 14 86 82 

Pes PAS GN at SENN SN vA Se 
Total — 376 352 416 384 


sce es Re a ee 
Quarterly average : 1st Quarter = 378 =752; 
2nd Quarter = 32 =70'4; 
rd Quarter = 9° = 83°9 
pny Quarter = 8 =76'8. 


75°2+70'4 + 83'2 + 76'8 
4 


Average for all the data = =76'4, 
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Seasonal Indices: 1st Quarter = pe x 100 = 98°43 ; 


70° Bh 
Qnd Quarter 764% 100 = 92°14 ; 
8rd Quarter = ee x 100= 108'90 ; 


4th Quarter = 28 x 199 = 100'58. 
16°4 i 

Note, The total of Seasonal Indices for the 4 quarters being 400, no adj 
ment is necessary. 


particular year to be Rs. 2,00,000. The Seasonal Indices of the 

data are as follows : 

Month : Jan. Feb. March April May June July 

Seasonal Indices: 76 77 98 128 187 122 101990 
Sept. Oct. Nov 
100 102 82 


Using the information, draw up a monthly sales budget for th 
company (assuming that there is no trend). 
(0. A., 1978 


SOLUTION : 


Average monthly sales = Rs. 2,00,000 
Rat iniaded elon <t Average Monthly Sales x Monthly Seasonal In 
Average Seasonal Index 
Average of the Seasonal Indices =73(76 + 77 +98 + 198 +137+122 i 
+101+104 +100 + 102 +82 + 73) 


1200 
197 100. 


[Since seasonal indices are generally expressed in percentages, 
average seasonal index is 100] . 


.'. Monthly Sales Budget : 


Estimated Sales (Bs.) 


(2,00,000 x 76 )+ 100 = 1,52,000 
February 77 (2,00,000 x 77 )+100 = 1,54, 
Mareb ~ 98 (2,00,000 x 98 )+100= 1,96,00 
April 128 (2,00,000 x 128) 100 = 2,56,00 
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Month Seasonal Indices Estimated Sales (Rs!) 
May 137 (2,00,000 x 187) -+ 100 = 9,74,000 
June 122 (2,00,000 x 122) +100 =2,44,000 
July 101 (2,00,000 x 101) + 100 = 2,02,000 
August 104 (2,00,000 x 104) +100 = 2,08,000 
‘September 100 (2,00,000 x 100) +100 = 200,000 
October 102 (2,00,000 x 102) +100 = 2,04,000 
November 82 (2,00,000 x 89)%100=1,64,000 
December _ 78 (2,00,000 x 73)+100 =1,46,000 


Total = 24,00,000 


9. Using 4 quarterly moving average in respect of the following 
‘data, find (a) the trend, (b) short-term fluctuations, and (c) seasonal 
variations : 


Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter 


1971 35 86 67 124 
1972 38 109 91 176 
1973 47 158 104 926 
1974 61 Eells 134 940 
1975 72 206 141 «307 

(0. A. 1977) 
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Sonvrion : 
Calculation of Moving Average (Trend) and Short-term fluctuations 
Year/Quarter Data 4-quarter 2-item 4-quarter Shortt 
moving moving moving fluctua- 
q total total average (Trend) tions 
(1) (2) (3) (4) (5) =(4)+8 (6) 
1971 =-1 35 ‘i 
2 86 
3 67 a 627 78°37 ae 
4 194 8 653 81°63 +. 49°37 
1972 1 38 - 700 87°50 — 49°50) 
2 109 Hoa 776 9700. +120 
3 91 yd 837 104°63 - 18 
4 176 479 895 11188 +641) 
1973 1 47 485 957 11963 .—72°6 
2 158 585 1020 127°50 +30 
3 104 549 1084 135°50 — 3150 
4 226 568 1117 139°63 +863) 
1974 4 61 598 1166 145°75 — 84! 
2 177 619 1210 151°25 + 25°75 
3 134 698 1235 154'38 - 20 
4 240 653 1275 159°38 +80'6 
1975 1 72 659 1311 163°88 —PF 
2 206 726 1885 173°13 + 32/87 
3 141 
4 307 


Year Ast Quarter. 2nd Quarter 8rd Quarter 4th Quarter g 


1971 —. = -11°37 49°87 
1972 — 49°50 12°00 — 13°63 64°12 
1973 - 72°68 30°50 - 31°50 86°37 
1974 - 84°75 95°75 — 20°38 80°62 
1975 -91'88 39°87 ae = 
Total  —298°76 10112 — 76'88 73°48 

_ Average — 74°69 95°28 =19°29 68°37 

_ Adjustment ‘065 “065 065 *065 


Seasonal 


Variations — 74°625 25°345 — 19155 68°435 
eh 
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10. Fit a straight line trend to the following series of produc- 
tion data : 
Year: 1960 1961 1962 1963 1964 1965 1966 
Y N87 38 37 40 41 45 50 
Y values being the monthly ayerage production in thousand tons, 


what is the monthly trend increment ? Find the monthly trend values 
from the fitted equation for January and March of 1961, (6.U.) 


SOLUTION : 


Here the production is given in monthly average. Let the 
equation of straight line trend fitted to the yearly data be y=a+ bt 
with origin 1963; ¢ unit=1 year and y unit =thousand tons. The 
normal equations for estimating a and b are Sy=an+bxdt and Syt= 
axt+b3xt", 


Computation of Straight Line Trend 


Year y t t? yt 
1960 37 =3 9 =i 
1961 38 ~9 4 ~76 
1962 37 oe 1 37 
1963 40 0 0 0 
1964 41 1 1 41 
1965 45 2 4 90 
1966 50 3 9 150 
i Total 988 0 28 57 


—_— 


From normal equations, 


Sy=an+b3zt Syt=adt+ byt? 
288 = 7a + 6.0 57 =a.0 + 286 
or, 288 = 7a or, 57 = 28) 
or, a=4114 or, b=2'04. 
.. the trend equation fitted to yearly data is 
y=41'14+ 2°04t. (origin 1968 ; ¢ unit=1 year) 


Since y represents the monthly average of production for each 
year and the unit of ¢ is 1 year, 2.¢., 12 months, so the trend of monthly 
average production increases by 2°04 in 12 months, 7.¢., 2°04+12=0°17 
per month. Hence, the monthly trend increment of production is 0°17 
thousand tons. 


372 BUSINESS STATISTICS 


The trend equation fitted to the yearly data is 
y=4114+204¢ (origin : 1963 ; ¢ unit =1 year) 
The trend equation fitted to the monthly data is 
y=41'14+0'17 ¢ (origin : June-July, 1963 ; ¢ unit =1 month) 
For estimating monthly trend values, the origin must be shifted 
to the middle of a month. If July 1963 is to be taken as origin then 
origin to be shifted half a month later. 4 
: *, the monthly trend equation fitted to the middle of J: uly, 1963 
8, 
y =41°144+0°17(t+ 4)= 41995 + O17t 
(origin : July 1963, ¢ unit=1 month) 


Estimation of MonthlyLrend Values : { 
(i) January, 1961 is 30 months behind the origin and putting — 30 
for ¢ in the trend equation for monthiy values, we get, e 
y =41°225 + 0'17(— 80)=386'125 thousand tons. 
(ii), March, 1961 is-28 months behind the origin and hence put- 
ting ~28 for t in monthly trend equation, we have, 
” y= 41-225 + 0'17(= 28) =86°465 thousand tons. 
Ans, ; (1) Monthly trend increment =0°17 thousand tons. 
(2) Trend value for January, 1961 =86'125 thousand tons 
(3) Trend value for March, 1961=36'465 thousand tons. 
11. Sales of a company rose from Rs. 39,45,000 to Rs. 46,21,000 
from second quarter to third quarter. The seasonal indices for these 
quarters are 108 and 150 respectively. The owner of the company 
holds that it is a losing concern. Analyse the above information for 
supporting the owner's view. (6.0. 1967) 


Sonvrion : 

Since the actual sales of the company for the second quarter 
were Rs. 39,45,000 and the seasonal index for that quarter is 103, the 
normal quarterly sales would be, 

100 


Rs. 39,45,000 x 103 ~ Bs: 38,30,097 


and the expected sales for the third quarter would be, 


Bs, $8,80,097 x 9? “Rs, 67,45,146 


a8 the seasonal index for the third quarter is 150. 


So the sctual sales of the third quarter is far less than the’ 
expected sales of the same quarter, 9 


i *~ "Rhus, the owner's view that the company is'a losing concern i8 
justified. 
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12. Fit an exponential trend y=ab* to the following data by the 
method of least squares and find the trend value for the year 1977. 


Year (2) : 1971 1972 1978 1974 1975 


Production 3 
(million tons): 182 142 157 170 191 


SoLuTIon : 

Since the number of years covered is odd, the origin is taken at 
the year 1973 and 1 year as unit. The exponential trend y=ab' is 
transformed into a linear form by taking logarithm on both sides. 
Thus, log y=log att. log b, or, Y=A+B#, where Y= log y, A=loga 
and B=log b. _ The constants A and B to be determined by solving the 
normal equations, 

SY=nA+BSt, DYt=ADt+Bxt". 


Calculations for Fitting Exponential Trend 


x y Y=log y t=2¢-1978 #3 Me 
1971 132 2°1206 -2 4 — 42412 
1972 142 271523 prot 1 = 2°1523 
1978 157 9°1959 0 0 0 
1974 170 22304 1 1 2°2304 
1975 191 2°2810 2 4 4°5620 


SY=10'9802; yt=0; st*?=10; SYt=0'3989; n=5. 
Substituting. the values obtained from the table in the normal 
equations, 
der eet 0.B bg { 5A =10'9802 
0°3989 =0.A + 10.B. * (10B= 0'3989. 
Therefore, A=2'19604, or, log a=2'19604, or, a= Antilog (2119608) a 


and  B=0°03989, or, log 6=0'03989, or, b= Antilog (0'03989) 
$ =1°0962 
., the equation for exponential trend is y=157'°01 (1'0962)* with 
origin at 1973 and ¢ units=1 year. " f 
To find the trend value for the year 1977, put t=4 (since the 
value of ¢ for the year 1977 is 4) in the equation Y=A+ Bi. 
the trend value for the year 1977 is, 
Y¥ = 2'19604'+ 0'03989 x 4 = 235560 ' 
or, log’y = 285560, or, y= Antilog (2°35560) = 226'8 million tons. 
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EXERCISE 12 


1. During two consecutive weeks the attendances at an exhibj- 
tion are recorded, the numbers being given 000’s : : 


Week, Sum. Mon. Tue, Wed. Thu. Fri. Satay 


1 24 55 29 48 52 55 61 
2 a7 52 32 43 53 56 65 


Calculate a seven-day moving average. (0. A. 1963 ; 

[ Ans.: 46°3, 46°7, 46°3, 46°7, 46°0, 46°1, 46’3, 46'9 ] 

2. Compute 4-yearly moving averages from the following : 7 
Year : 1980 1931 1982 1933 1934 1985 1936 1937 
Value (Rs.): 865 360 355 ©3380 300 330 340 290 


Year : 1988 1939 1940 1941 1942 1943 1944 1946. 
Value (Re,): 280 250 935 255 950 245 995 210 | 


Year : 1946 1947 1948 1949 1950 
Value (Rs.): 200 230 995 200 195 


[ Ans. : 844, 389, 827, 320, 312, 300, 277, 259, 251, 247, 245, 2 ' 
226, 218, 216, 215, 213) 


3. The following series of observations is known to have ry 
business cycle with a period of 4 years. Find the trend values by the 
moving average method Selecting an appropriate period of the moving 
averages : Bi: 


Year: 1951 1952 1953 1954 1955 1956 1957 1958 1959 
Y=: 506 620 10836 673 588 596 1116 738 663 


‘g 


Year : 1960 1961 1962 1963. 1964 1955 
Y =: 778. 1189 818 745 845 1976 ; 
(0. U., 1971) 


[Ans.: 4-yearly m.a. : 719'0, 738'8, 758°8, 776°4, 793°9, 812°9, 831'6, ” 
: ; 850°8; 871'0, 890°3, 910'1 ] © 
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4, The following table shows the number of salesmen working 
for a certain concern : 


Year =: 1970 1971 1972 1973 1974 


Number : 28 38 46 40 56 


Use the method of least squares to fit a straight line and 
estimate the number of salesmen in 1975. 
[ Ans: y=41'6+ 5'8t ; 59] 
5, Fit a straight line trend equation by the method of least 
squares and estimate the value for 1969 : ‘ 
Year: 1960 1961 1962 1963 1964 1965 1966 1967 
Value: 380 400 650 720 690 600. 870. 930 
(C. A. 1978 ) 
{ Ans.: y=655,+35'8t ; 1048'8] 
6. Below are given the figures of production in thousand tons 
of a sugar factory : 
Year : 1969 1970 1971 1972 1973 1974 1975 
Production: 177 88 94 85 91 98 90 


Fit a straight line by the method of least squares and show the 
trend values. 
{ Ans.: y=89 + 2¢ ;'88, 85, 87, 89, 91, 98, 95] 
7. Fit a straight line trend by the method of least squares to 
the following data and obtain the trend value for the year 1972 : 
Year : 1960 1961 1962 1963 1964 1965 
Production 
(Lakh tons):  8°6 38 4°4 47 56 73 


Year ; 1966 . 1967 . 1968 1969 1970 1971 


Production 
(Lakh tons); 71 76 77 9°0 90 10°1 
[ Ans.: y=6'658+'599t ; 10°55 ] 


8. Compute the trend values by the method of least squares 
from the data given below : 


Year ¢ 1962 1963 1964 1965 1966 1967 1968 1969 
No. of sheeps 
(in Lakhs) : 56 55 51 47 42 38 35 82 


[ Ans.: y=44°5-3°'71¢ ; origin : 1965°5 ] 
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» 9, Fit an equation of the type y=a+ bt+ct? to the following 
data ; : ; 
Year : 1971 1972 1973 1974 1975 1976 
Sales (in ‘ 
million tons) : 100. 105 115 100 112 118 
[ Ans. : y=107'66 +2'74t + ‘9342 ; origin : middle of 1973-74) 
10, Find the quarterly trend values from the following data by 
moving ayerage method, using an appropriate period : 


Quarterly Output (miltion tons) 


Quarter/Year 1964 1965 1966 
Laver I 52 59 87 
II 54 63 61 
Ir 67 75 72 
Iv 55 65 60 
: , (1. 0. W. A,, 71) 


[ Ans. : 4-quarter moving averages : 57°9, 59°9, 62'0, 64'2, 

. : 65°2, 64°8, 64°1, 63°1 (million tons) } 

* 1d. Find ‘the trend values (mixed with eyclical”movements, if 
any) from the following data by the method of moving averages : § 


Quarter/Yeor 193019311989 1998 
I W979 orl Matdo BAZ erad 945 
by Bhar. ery. Be 49 
IIT 48° 55tCtC«CBSCO 
Iy imate oy ae ee 


‘ (1.6. W. A., 1968) 

(Ans. : 4-quarterly moving averages : 87°1, 39°1, 41°92, 43°9, 45°9, 

47°9, 50°0, 52°2, 53°2, 52'S, 52'1, 511) 

“12, Caleulate the ‘seasonal’ indices by the ratio to moving 
average method from the following data : 


‘o> Year/Quarter I IDs Il IV 
anny 1975 ISOS 62a, aol 63 
keiMie? "1976" 65 OS aes OB 61 

1977 68 63 _ 63 67 


-, [ Ans, : 105°30, 95°21, 10097, 98°52] 
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18. Deseasonalise the following production data by the method 
of moving average : 


Quarterly Output (’000 tons) 


Quarter/Year 1970 1971 1972 1973 
I 30 49 35 75 
II 49 50 62 79 
Ill 50 61 60 65 
IV 35 20 25 70 
(0. U., 1978 ) 


Ans. : 28°28, 38°70, 39°28, 57°74, 47°28, 39°70, 50°28, 42°74, 33°28, 
s 51°70, 49°28, 47°74, 73°28, 68°70, 54°28, 92°74 ] 
14. Ina study of its sales, a motor company obtained the 
following least square equation : 
y= 1600 + 200x 
(origin at 1950, # units=1 year, y=number of units sold annually) 


“The company has physical facilities to produce only 3,600 units a 
year and it believes that it is reasonable to assume that at least for the 
next decade the trend will continue as before : 

(a) What is the average annual increase in the number of units 
sold ? 

(0) Estimate the annual salesfor 1965. How much in excess of 
the company’s present physical capacity is this estimated value ? 
C. U., 1969 ) 


[ Ans. : (a) 200 units, (6) 4600 units, 1000 units ] 


15. The trend equation fitted to a series of sales data is given by 
‘ y= 1600 + 200” 
(origin at 1950, 2 units=1 year; y=number of units sold yearly) 


‘The company. has the production capacity of 3,600 units a year. 
Find by what year will the company’s expected sales have equalled its 
present production capacity (assuming that ‘at least for the next decade 
the trend will continue as before). (G. U., 1966) 

[ Ans.: year 1960] 
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16, Find the seasonal variations by the method of link relatives 
from the data given below : 


Year/Quarter i Il III IV 
1975 65 58 56 61 
1976 68 63 63 67 
1977 70 59 56 52 
1978 60 55 51 58 


[ Ans. ; 109°3, 97'8, 93°9, 99'0] 
17. Deseasonalise the following sales data and interpret them: 


Quarter Sales (Rs. 000) Seasonal Indices 
I 23°7 0'78 
II 25°2 1°24 
III 21°4 0°50 
IV 65°4 1°48 


(0. U., 1968 ) 
[ Ans. : 30°4, 90'3, 49°8, 44°9 ; Multiplicative model used ] 


18. The seasonal indices of the sale of ready-made garments of a 
particular type in a certain store are given below : 


—— 


Quarter Seasonal Indices 
Jan.-March 98 
April-June 89 
July-Sept. 83 
Oct.-Dec. 130 


Serres 

Tf the total sales in the 1st quarter of the year be worth Rs. 10,000, 
determine how much worth of garments of this type should be kept in 
store by the store to meet the demand in each of the remaining 
quarters. 


[ Ans. : Rs, 9081, Re. 8469, Rs. 13265 ] 
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19. For the following data, obtain seasonal indices by any 
method that you consider suitable : 


Passenger miles flown by domestic survices of U.K. 
Airlines (millions of passenger miles) 


Year/Quarter I II Ill Iv 

1961 77 166 252 104 

1962 99 191 287 123 

1963 113 2929 316 156 

1964 152 255 357 180 
(C. U., 1971 ) 


[ Ans. : Moving average methcd, additive models — 72°27, 
23°84, 106'380, — 57°87 ] 


20. On the basis of quarterly sales (in Re. Lakhs) of a certain 
sommodity for the years 1961—’65, the following calculations were 
made; 

Trend: y=25'0+0'6¢ with origin at 1st quarter of 1961, where 
t=time units (one quarter) and y=quarterly sales (Rs, Lakhs). 


Seasonal Variations : 
Quarter f il II Ill IV 


Seasonal Indices : 90 95 110 105 
Estimate the quarterly sales for the year 1962 (use multiplicative 
(1. 0. W. A., 1972 ) 


model). 
[ Ans, : 24°66, 26°60, 81°46, 30°66 ] 


SET THEORY 


Introduction 

George Cantor (1845—1918), a German Mathematician, was the 
creator of set theory. On the basis of set theory he developed mathe- 
matical analysis. His work on the Theory of sets was accepted as: 
fundamental contribution to mathematics. 


Set 


In our daily life we use phrases like a bunch of keys, a set of 
hooks, a tea set, a pack of cards, a team of players, a class of students, 
ete. ‘Here the ‘words bunch, set, pack, team, class—all indicate 
Collections or aggregates. In mathematics also we are to deal with 
collections. Mathematicians use the word set for a well-defined collec- 
tion of objects. 

A set is a well-defined collection of distinct objects. Hach object 
is said to be an element (or member) of the set. 

We shall use capital letters A, B, O or X, Y, Z or P, Q, Bto 
indicate sets and small letters a, b, c or &, y, 2 or p, g, r to denote 
elements of a set. 


Symbol 4 

“The symbol € is used to denote ‘is an element of’ or ‘is a member 

- of or''belongs to’. Thus for @ € A, read as z is an element of A or 
x belongs to A. Again for denoting ‘not an element of’ or ‘does nob 


belong to’ we put a diagonal line through € thus ¢. So if y does nob 
belong to A, we may write (using the above symbol), y ¢ A. , 


EXAMPLE : 
If V is the set of all vowels, we can Say e © Vand/f ¢ V. 


Methods of Describing a Set. 
There are two methods : 


(1) Tabular Method (or Roster Method) 


As mentioned before a set is denoted by capital letters, i.c., A, Br 
X, Y, P, Q, etc. The general way of designating a set is writing all 
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the elements (or members) within brackets ( ) or { } or [ ]. Thus 
set A may be written as A={green, red, blue}. 


The order of listing the elements is ‘not important, so the 
same set A may be written again as A={blue, green, red}. Further 
any element may be repeated any number of times without 
disturbing the set. The same set A can be taken as A={blue, blue, 
green, red, red, red}. 


For large number (finite) of elements we will use dots to represent 
the elements within the set. If A bea set of odd numbers upto 17, 
we may write (for convenient) A={1, 3, 5,--, 17}. Again, if A be 
a set of Prime Ministers, A={Nehru, Sastri, Gandhi, Desai}. 


(2) Selector Method (or Rule Method) 


In this method, if all the elements of a set possess some common 
property, which distinguishes the same elements from other non- 
elements, then that property may be used to designate the set. For 
example, if @ (an element of a set B) has the property haying odd 
positive integer such that 3 is less than equal to # and x is less than 
equal to 17, then in short, we may write, 


B={a: w is an odd positive integer and 3 < @<17} 
Similarly C={ a : @ is a day beginning with Monday }. 


Note: (1) ‘:’ is used after # is to be read as ‘such that’. In some cases 1) (a 
vertical line) is used, which is also to be read as ‘such that’, 


(2) If the elements do not possess the common property, then this method is 
not applicable, 


Types of Sets 


(1) Finite Set: 
It is a set consisting of finite number of elements. 


EXAMPLE : 
A={1, 2,8, 4,5} 
B={ 2, 4, 6, , 50} 
C={«: @ is number of students in a class }. 


(2) Infinite Set : 
A set having an infinite number of elements. 


1 
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EXAMPLE : 
A={1, 2, 8,07 } 
B={ 2, 4, Grr} 
O={ «a: a is number of stars in the sky }. 


(3) Nall or Empty or Void Set : 
It is a set having no element in it, and is usually denoted by ¢ 
(read as phi) or { }. 


EXAMPLE : 
The number of persons moving in air without any machine. 
A set of positive numbers less than zero. 
A={@>a is a perfect square of an integer 5< a < 8}. 
B={a:visa negative integer whose square is— 1 }. 


(4) Equal Set : 

Two sets A and B are said to be equal if all the elements of 
A belong to Band all the elements of B belong to A. 
EXAMPLE : 

A={1, 2,3, 44, B={ 38, 1, 2, 4}, 
or A={a, b, c, }, B={a,a, a,c, c, b, b, b, bt. 

Note: The order of writing ths elemants or repetition of elements does nob 
change the nature of set. 

Now, A=B ifand only if{jtE€ AGSa€ B} 

Let A={a:2*-Tx+12=0}, B=43, 4}, O={3, 3, 4,3, 4} 


Then A=B=O, since elements which belong to any one set, also 
belong to the other two sets. 


Tt A={2, 3, 4} B={4, 2,3} 
X={1, 3, 4} Y={2, 8, 5if 
Then A=B, and X#Y, 


Again let A={ 7: wis a letter in the word STRAND } 
B={: @ is a ietter in the word STANDARD } 
C={«: a is a letter in the word STANDING 


Here A=B, B¥C, AXG. 
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(5) Equivalent Set : 


If the total number of elements of one set is equal to the total 
number of elements of another set, then the two sets are said to be 
caneaants It is not essential that the elements of the two sets should 

e same. 


EXAMPLE : 
A={1, 2, 3,4} B={b,a,1,1} 
In A, there are 4 elements, 1, 2, 3, 4 
In B, there are 4 elements, b, a, 1,1 (one to one correspondence) 
Hence A=B (symbol =is used to denote equivalent set) 
A={8,5,8,9} B={5, 5, 8, 9, 3,8, 9} 
O={b, 0, 0, kt 
Here A=B and A=0 


(6) Sub-set : 


Tf each element of the set A belongs to the set B, then A is said 
to be a sub-set of B. Symbolically, the relation is ACB and read as A 
is a sub-set of B or A is contained in B or A is included in B. 


It may be mentioned here that usually set A should be smaller 
than set B, may be equal also, but in no case A should be greater 
than B. 


EXAMPLE : 
If B={1, 2, 3}, then the sub-sets of Bare {1},{2t,{3, {1,2}, 
{2,33,11, 3} {1, 2, 3} and ¢. 


Note: (1) Every element of a set is an element of the same sot, there- 
fore every sot is a sub-set of itself, i.¢., AC A, 

(2) Null set contains no element, so all the elements of ¢ belong to every sot, 
ia, PC A. 

(8) It follows that every set has at least two sub-sets, é.¢., the null set and the 
set i tself, 

(4) fACBandBCO>ACO, | 

(5) If A C Band BC A>A=B, 

(6) If AC ¢, than A=¢, 
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(7) Proper Sub-set : 


If each and every element of a set A are the elements of B and © 
there exists atleast one element of B that does not belong to A, then © 
the set A is said to be a proper sub-set of B (or Bis called super set of A). } 
Symbolically, we may write, 


ACB (read as A is proper sub-set of B). 

And BCA means A is a super set of B. 

If B={a, b,c}, then proper sub-sets are fa}, {bt,ict {a,b} 
{bch {a ct, ¢ 
(8) Power Set : 

The family of all sub-sets of a given set A is known as power seb 
and is denoted by P(A), 
EXAMPLE : : 

(i) ItA={a}, then P(A)={a}, 4. 

(ii) IfA={ a,b}, then P(A)={a}, {bh {adh¢. 

(iii) If A={ a, b, cf, 

P(A)={ahtohichiabh{achiach{abch¢ 


Thus when the number of elements of A is 1, then the number of 
sub-sets is 2, when the number of elements of A is 2, then the 
number of sub-sets is 4=2* and when it is 3, the number of sub-sets is 
8=2°. So, if A has » eléments, P(A) will have 2” sub-sets. 


Universal Set 


In mathematical discussion, generally we consider all the sets to 
be sub-sets of a fixed set, known as Universal set or Universe, deno 
by U. An universal set may be finite or infinite. 


EXAMPLE : 


(i) A pack of cards may be taken as an universal set for a set of 
_ diamond or spade. > fe 


; (ii) A set of integers is a Universal set for the set of even oF 
odd numbers. 


Venn diagram 


John Venn, an English logician (1834—1923) invented thisdiagram | 
to present pictorial representation. The diagrams display operations 
on sets. In ® Venn diagram, we shall denote Universe U (or X) by § 
region enclosed within a rectangle and any sub-set of U will be show? 
by circle or closed curve. 
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Union of Sets 
If A and B are two given sets then their union is the set of those 
elements that belong either to A or to B (or to both). 
The union of A and B is denoted symbolically as AUB (read as A 
union B or A cup B) In symbols, 
AUB={a:aveAorzeB} 


EXAMPLE : 
(i) Let-A={ 1, 2, 3, 4,5}, B={ 2, 3, 5, 6, 7}, O={2, 4, 7,8, 9} 
Then AUB=4 1, 2, 3, 4, 5, 6, 7 }, and BUA={ 1, 2, 8, 4, 5, 6, 7 
*, AUB=BUVA (commutative law) 
Again (AUB) UC={ 1, 2, 3, 4, 5, 6, 7, 8, 9} 
(BUC)={ 2, 3, 4, 5, 6, 7, 8, 9} 
AU(BUC)=$'1, 2, 3, 4, 5, 6, 7, 8, 9} x 
. (AUB)UC=AU(BUO) (associative law) 


(ii) It A={a,b, c,d}, B={0},0=¢ 


Then AUB={0,a, by c,d}, AUC={a,},0,d}=A, 
and BUO={0} 


Union of sets may be illustrated more clearly by using Venn 
Diagram as follows :— ; 
The shaded region indicates the union of 
A and B, i.e, AUB., _ A B 


Intersection of Sets 
Tt A and B are two given sets, then their 
intersection is the set of those elements that 
belong to both A and B, and is denoted by ANB ise 
(read as A intersection B or A cap B) 
In symbols, ANB={a:a2A andze BS 


EXAMPLE : 
(i) For the same sets A, B; C given above in example (i), 


ANB={2,3,5}, here the elements 2, 3,5 belong both to A 
and B, And BNA={2,8,5} ©. ANB=BNA (commutative law) 


(ANB) NO={2} 
(BNC) ={ 2,7}, AN (BNC)={ 2 }. 
eA NB)NG=AN(BNC) (associative law) 


-Bus. Stat,—25 
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(ii) For the sets A, B, CO giyen in example (ii) above, 

ANB={ 0}, BNC=¢, ANC=¢. 

Intersection of two sets A and B is illustrated. clearly by the 
Venn Diagram is as follows :— 


A B The shaded portion represents the inter- 
section of A and B, i.e., ANB, 


Disjoint Sets 


Two sets A and B are said to be disjoint 
ANB if their intersection is empty, i.e., no element of 
A belongs to B. 


EXAMPLE : 


A={1,3,5}, B={2,4}, 
ANBe=g¢. .. Aand B are disjoint sets. 


Difference of Two Sets 


If Aand B are two sets, then the set containing all those elements 
of A which do not belong to B, is known 4s difference of two sets, and 
is denoted by the symbol A~B or A-B (read A difference B). Now, 
A~B is said to be obtained by substracting B from A. 


In symbols, A~B={2:¢ € Aanda ¢ B}. 


EXAMPLE : 
(i) If A={1, 2,3, 4,5}, B={ 3, 5, 6, 7}, then A~B={1,2,4} 
(ii) If A={ @: a is an integer and 1 <2 < 12}, 
B={a: 2 is an integer and 7 << z < 14}, 
then A~B={ 7: @ is an integer and 1 <a < 6} 
A~B is represented by a Venn Diagram as follows : 
The shaded portion represents A~ B. A 


Complement of a Set , 
Let U be the universal set and’A be its 
sub-set. Then the complement set of A in 
relation to U is that set whose elements belon 
to U and not to A. This is denoted by A’ ~* AWB 
(=U~A) or A’ or A. 


In symbols, A’={2 : x #U and ¢ A}. We may also write 
A'={c:a@ € A} . . 
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Remarks : 

1, The union of any set A apd its complement A’ is the universal set, 4.¢., 
AUA’=U. 

3. The intersection of any set and its complement A’ is the null set, i.¢, 
ANA =¢. 


EXAMPLE : 
U=f{ 1, 2, 8,--+*, 10}, A=f2, 4, 7} 
A(=U~A)={ 1, 3, 5, 6, 8, 9, 10 } 
Now AUA’={1, 2, 3,-----10 }=U, AUA’=¢ 


Again (A’) ={ 2, 4, 7}=A (i.e., complement of the complement of 
A is equal to A itself). 

U'=¢ (i.e., complement of a universal set is empty). 
’ ee the complement of an empty set is a universal set, 7.¢., i 
It ACB then B’CA’ for sets A and B. 
Complement of A is represented by 
the shaded region. “ 


Symmetric Difference 

For the two sets A and B, the symme- 
tric difference is (A~B)U(B~A) and is 
denoted by AAB (read as A symmetric 
difference B). 


EXAMPLE : 
Let A={1,2,3,4,8}, B={2,4,6,7} 
Now, A~B={1,3,8}, B~A={6,7} 
+, AAB={ 1, 38,8 $U{ 6, 7}={ 1,3, 6, 7,8} 


YI); 


~, 


By Venn Diagram : 
AAB is represented by shaded region. 


It is clear that AAB denotes the 
set of all those elements that belong to A 
and B except those which do not belong 
to A and B both, i.e., it is the set of elements 
which belongs to A or B but not to both, 


1 


ANB 
Difference between 4, fO}and {¢g} 
¢@ is null set 
{0} is a singleton whose only element is zero. 
{ 4} is also a singleton whose only element is a null set. 
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Worked out Examples 
1. Rewrite the following examples using set notations : 
(i) First ten even natural numbers. 
(ii) Set of days of a week. 
(iii) Set of months in a year which have 30 days. 
(iv) The numbers 8, 6, 91915 
(vy) The letters m, a, t, h, e, m, a, t, i, c, 8 
SOLUTION : : 
(i) A={ 2,4, 6,8, 10, 12, 14, 16, 18, 20 } (Tabular method) 
={a2:@ is an even integer and 2< a < 20} (Selector 
method) : 


(ii) A={ Sunday, Monday, «+: Saturday } (Tabular) 
={@:@ is a day in a week } (Selector) ‘ 


(iii) A={ April, June, September, November } (Tabular) 
={@:@ isa month of 30 days } (Selector) 


(iv) A={a: 2 is a positivenumber multiple of 83and3 <2 <15} 
(vy) A={2: «isa letter in the word mathematics } 


2. Represent the following sets in selector method : 

(i) all numbers less than 15 : 

(ii) all even numbers 

(iii) all real numbers in closed interval { 1, 11 }. 

(iy) all real numbers in open interval { —2, 3} 
SOLUTION : ’ 

Taking A be the set of numbers in every case : 

(i) {e:aeAand2<15} 

(ii) {a:ceAand visa multiple of 2 } 

(iii) {e: me Aandi ce <i1i} 
(iv) {a:eeAand -2<¢<3} 
-_, 3+ Given A={the odd numbers between 2 and 10}, B={3, 4, 5}, 

O=f{all integers less than 26 which are perfect square}, D={1, 
x, 2°}, B=46, F={1}, G=fall the digits in the product of 5 and 


47}, H=fall numbers between 90 and 110 which are multiple 
of 7}, 1={1, 2, 3, 4, 5, 6}, K=417}. 


For each of the following sets, write down the set given above 
which is equal to: 


(i) {1, 4, 9, 16, 25% (ii) {e:@¢Tand 22—7=0} 
(iii) {all prime factors of 30} - (iv) fe:aeNandi<a2<7 


i 
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(v) {3, 5, 7, 9 (vi) {x?, 1, at 

(vii) {all prime numbers greater than 13 and less than 19} 

(viii) {all numbers by adding a pair of the numbers from 1, 2, 3} 

(ix) {e:@e Nand «=1} (x) {91, 98, 105} 
SOLUTION : © 

(i) ©, Gi) B, (iii) G {prime factors of 30 are 2, 3, 5, again5 x 47= 
235, digits are 2, 3, 5}, (iv) I, (v) A, (vi) D, (vii) K, (viii) B fadding 
each pair, we find 3, 4, 5}, (ix) F, (x) H. 

4, (i) Is theset A={e:2<aztanull? 

(ii) Is the set B={e:2+4=4} 2 null? 


(iii) Is the set C={e:2 is a positive number less than zero} 
Y a null? 


SOLUTION : 
(i) Null, as there exists no number less than itself. 
(ii) Not null, the set has an element zero. 
(iii) Null, as there exists no positive number less than Zero, 


5. State with reasons whether each of the following statement 
is true or false : 


(i) {4} e {1, 2, 3} (ii) 1 {1, 2, 3} 
(iii) {1}Cf1, 2, 3} (iv) 1C{1, 2, 3} 
(v)  {1, ab e {1, 2, 4} (vi) {1, 2C 41, 2, 3} 
(vii) {1, 2,°3}={2, 3, 4} (viii) {1, 2, 3}={8, 2, 3, 1, 2, 1} 
(ix) {1, 2, 8} e {1, 2, 3} (x) {1, 2, 81C43, 1, of 
_ Gi) de fi, 2, 8 (xii) 4¢ 41, 2, 3} 
SoLvumrION : 


i (i) False, {1} is a singleton and not an element of {1, 2, 3} 
| (ii) True, since 1 is an element and belongs of {1, 2, 3} 
(iii) True, {1} is a proper sub-set of {1, 2, 3} ; 
(iv) False, an element can not be a sub-set of a set. 
(v) {1, 2} is not an element but a sub-set of {1, 2, 3}, so it is false. 
(vi) True, {1, 2} is a proper sub-set of {1, 2, 3} 
(vii) False, as 1¢{2, 3, 4} and 4¢{1, 2, 3} 
(viii) True, since both sets contain same element. 
, (ix) False, a set does not belong to the same set. 
(x) False, as {1, 2, 3} is not a proper sub-set of {8, 1, 2} 
(xi) False, null set is not an element of j1, 2, 3} 
(sii) True. 
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Few Properties 
On Union of Sets : 
Some properties for sets A, B and C are— 
(1) AC(AUB) and BC(AUB) 
(2) AUASA 
(8) AU¢=A (identity property) 
(4)j AUB=¢6=> A=, Beg 
(5) AUB=BUA (commutative property) 
(6) (AUB)UG=AU(BUO) (associative property) 


PRoor or Commurative Law : AUB=BUA 
We are to show (i) AUBCBUA 
(ii) BUACAUB 
(i) Let any element « e AUB then 
eeAUBSaeAorvzeB 
=aeBorzredA 
=> 2 e (BUA) which shows that any element of 
AUB is also an element of BUA, 
AUBC BUA «+ = (1) 


(ii) Let any element*y belong to BUA, then 
yeBUA=>yeBoryeA 
=veAoryeB 
=> ys (AUB) which shows that any element of 
BVA is also an element of AUB. 
BUACAUB = + (9) 
From (1) and (2) we find AUB=BUA, 


Proor oF ASSOCIATIVE PropErty : (AUB)UC=AU(BUC) 
We are to show (i) (AUB)UCCAU(BUC) 
(ii) AU(BUC)C(AUB)UG 
(i) Let « be any element of (AUB)UC, Then 
2 e(AUB)UC=>2 (AUB) ore C 

= (eeAorzeB)orreO 
=A or(ae Boras OC) 
= ve A or (BUC) 
=>axeAU (BUC) 
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which shows that every element of (AUB)UC is algo an elenent 
of AU(BUC) 


“. (AUB)UCGAU(BUC) e+ os (1) 
(ii) Let y be any element of AU(BUC), Then 

ye AU(BUC)> ye A ory (BUC) 
=yeAor(ye Borys 0) 
=(yeAorye B)oryeC 
= ye (AUB) ory eC 
=>ye(AUB)UG 

So we have, AU(BUC)C(AUB)UG © «+ + (2) 


m (1), (2) we can sey, by the definition of equality of sets, 
(auB)Uo= AU(BUC). 


By Venn Diagram : 


{Maus 
Sc 
TOTAL SHADED REGION 
(AUB) UC 
; £3euc 
, og TE 


TOTAL SHADED REGION 
AU(BUC) 


: On Intersection of Sets : 

Some properties for sets A, B and C are— 
(i) AMNBCA and ANBGB 

(2) ANg=¢ 
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3) ANA=A 
(4) ANB=BNA (commutative property) 
(5) (ANB)NCG=AN(BNC) (associative property) 
(6) ACB then ANB=A and if BCA then ANB=B. 
(7) If ACB and BGO, then AC(BNC). 4 
Proof of commutative property, z7.e., ANB=BNMA is similar to 
AUB=BUA which is shown above and is left to the students asan _ 
exercise. 


Proor oF AssocraTive PRopERtY :(ANB)NC=AN(BNC) , 
(C.A. Entr., May 76 ] 
We are to show (i) (ANB)NGCANtBNO) 
(ii) AN(BNG)C(ANB)NG 
(i) Let # be any element of (ANB)NC. Then © 
a@e(AUB)NC => ae (ANB) and awe O 
-=>(eeAanda:B)andzeC 
= we Aand (ee Band w « 0) 
= aeAand re (BNC) 
=> a2e AN(BNC) 


Thus every element «of(ANB)NC>is also an element of 
AN(BNG) 
(ANB)NCCAN(BNG) + + (1) 
(ii) Let y be any element of AN(BNC). Then 
ye AN(BNC) = ye A and y e (BNO) 
= yA and (ye Band y « 0) 
= (ye Aand ye B)and yO 
= y2e(ANB)andyeO 
= 2e(ANB)NC 


The every element y of AN(BNGC) is also an element of | 
(ANB)NG 


AM(BNO)S(ANB)NG + (Q) 
From (1) and (2), we have AN(BNG)=(ANB)NG 


Note: Property (6) and (7) may be illustrated by Venn Diagram which is left 
to students as an exercise. 
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On Union and Intersection of Sets (Union distributes over Inter- 
section) ; 


DISTRIBUTIVE Law : 
Let A, B and © are any three sets, prove that AU(BNO) 
Br Nae Also verify the result by Venn Diagram. 
C.A. Inter, May '75, Nov. '76] 
Here we are to show (i) AU(BNC)C(AUB)N(AUC) 
(ii) (AUB)N(AUC)CAU(BNO) 
(i) Let a be any arbitrary element of AU(BNC), Then 
ae AU(BNC) = @eAorze (BNC) 
= ae Aor(x# Band we 0) 
= (ee Aor ae B) and (we A ora ¢ 0) 
=> a « (AUB) and a « (AUC) 
=> a ¢ (AUB)N(AUO) 
Thus every element # of AU(BNC) also belongs to 
(AUB)N(AUC) 
AU(BNO)C(AUB)N(AUC) tee se (1) 
(ii) Let y be any element of (AUB)M(AUO). Then 
ye (AUB)N(AUGC) => y « (AUB) and’y « (AUC) 
=> (ye Aorys B) and (y¢A ory ¢ C) 
=veAor(ye Bandy 0) 
=>yeAorye(BNC) 
=> ye AU(BNO) 
In the same way, (AUB)M(AUO)CAU(BNC) -- (2) 
From (1) and (2), we can say, AU(BNC)= (AUB)N(AUO) 


By Venn Diagram : 


TOTAL SHADED REGION 
AU(BNC) 
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aus 
MMavc 
fH (AUB)N(AUC) 


(Intersection distributes over union) 


2, Let A, B and CO are any three sets, prove that AN(BUC)= 
(ANB)U(ANO) \ 


As before, we are to show (i) AN(BUC)C(ANB)U(ANC) 
(ii) (ANB)U(ANC)CAN(BUO) 
(i) Let ¢ AN(BUC) => a A and ¢ « (BUC) 
= aeAand (@ e Boras 0) 
=> (ce A and  ¢ B) or(< A and x ¢ 0) 
= ae (ANB)U(ANC) 
Thus AN(BUC)C(ANB)U(ANO) =» + (4) 
(ii) Let y(ANB)U(ANO)=> y ¢ (ANB) or y e (ANC) 
= (ye Aand ye B) or (ye Aand y « O) an 
‘ =yeAand(yeBorye 0) " 
= ye AN(BUC) * 
Thus (ANB)U(ANC)CAN(BUC) «+ «= (9) 
From (1) and (2), we find, AN(BUC) =(ANB)U(ANG) 
Note: The student may verify the result by Venn Disgeam, 
Duality 


Union and intersection are dual operations to each other. If we 
can establish the validity of one law having U and A as operations, 
then its dual will be also true, replacing U by M and M by U. 


De Morgon’s Laws: 


1. Complement of a union js the intersection of comple- ; 

ments, 
For any!two sets Aand B. Prove that (AUBY'=A/NB’ 

u C.A. Entrance, May '76] 
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We are to show (i) (AUB)'CA'NB’ 
(ii) A‘NB'C(AUBY’ 
G) Let « ¢ (AUB) => 2 ¢ (AUB) 
=2ac¢éAanda ¢ B 
=2€ A'’anda € B’ 


=a € A’NB’ 
Thus every element of (AUB)! is also a member of A’NB’ 
*, (AUB)'CA'NB! (+ ++ (1) 


Gi) Letye ANB’ => ye A’ andy eB’ 
=>y¢éAandy €B 


= y € (AUB) 
=> y ¢ (AUB) 
Thus every element belonging to A‘(B’ also belongs t> (A UB)! 
A'NBIC(AUB) «+ (2) 


From (1) and (2), we find (AUB)'=A'NB' 
Note: It may be noted that es union is followed by ¢ it changes to 
‘intersection and vice versa, 


2. Complement ‘ of an intersection is the union of the comple- 
ments, i.c., (ANB) =A’'UB’. 


(The proof is left to the student. The student may follow the 
above method.) s 


Some Important Results : 
1, (AUB)N(AUB)=A. 
For, (A UB)0(AUB!)=AU(BNB))= AUS=A. 
9. (ANB)U(ANB)=A. 
_ For, (ANB)U(ANB’)=ANn( (BUB)=ANU=A. 


_De Morgon’s Laws on Difference of Set : 


1, IfA, Band C are three sets, then show that 
A~(BUC)=(A~B)N(A~0) 
[C.A, Entrance, May '75, Inter, May '76 ] 
Let « € ASBUO) = # € Aand #¢ (BUC) 
=a € Aand(e¢ Band a ¢ OC) 
=> (¢ € Aanda ¢ B)and(#€ A and 2 ¢ 0) 
=a (A~B) and 2 «(A*C) 
=> ae (A~B)N(A~CO), 
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Thus, A~(BUO)C(A~B)N(A~0) é 

Again, let ye (A~B)N(A~O)> y é (A~B) and y « (A~O) 
=> (v © A and y@B) and (y € A and y¢ O) 
=v © Aand (y¢ Band y¢ OC) 
=y © Aand y¢ (BUC) 
=>y« A~(BUC) 

Thus, (A~B)N(A~O)CA~(BUC) 

A~(BUO)=(A~B)N(A~O) 
By Venn Diagram : 


EEEY+avieuc) 


2. If A, Band O are three séts, then show that 
A~(BNO)=(A~B)U(A~0) 
Let 2 € A~(BNO)> 2 & A anda ¢ (BNC) 
=r € Aand (x ¢ Bora € Q) 
=> (« € Aandz € B)or (2 € A and x ¢ O) 
= 2 e(A~B) or x e(A~O)* 
= @e(A~B)U(A~0) 
ie., A~(BNO)C(A~B)U(A ~0) 
Again, let y « (A~B)U(A~C) Sye (A~B) or y ¢ (A~O) 
P > (y € Aandy¢ B)or(y G Aandy ¢ 0) 
= © Aand (y ¢ Bory ¢ 0) 
= € Aand y ¢ (BNO) 
= ye A~(BAC) 
i.e., (A~B)U(A~0)C AX(BNG) 
A~(BN0)=(A~B)U(A~6) 
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This result may also be verified by Venn Diagram as before. The 


student may do it now. 


Miscellaneous Results (on Union, Intersection and Difference), 

1. Prove that A~B=ANB’ 

Lett € A~B=>e € Aande eB 
=ameAand ae B’ 
=a-ANB 

Thus, A~BCANB’ 
Again, let ye ANB’ yz A andy B’ 
=> € Aandy¢ B 
=>yeA~B é 
Thus, ANB'CA~B 
A~B=ANB’ 
Note: The student may now try to show B~A=BN A’. 


2, Prove that (A~B)NB=4. 
Let @ be at least nRTelenent belongs to (A~B)NB 
then # e(A~B)NB=> @<(A~B)andaeB 

= (¢ € Aandez ¢ B)ande € B 


=a G6 Aanda ¢ Bande € B, which is 
absu 


rd, 


since  € Band x ¢ B cannot hold simultaneously. 


(A~B)NB=¢, 
3. Prove that AN(B~O)=(ANB)~O 
We are to show AN(B~O)C(ANB)~O ana 
(ANB)~OCAN(B~C) 

For the first one, let ¢ e AN(B~O) 
= «eA and 2 ¢(B~C) 
= & A and (f € Band 2 ¢ (0) 
=>(e@ € Aandz € B)andz eC 
=>a€ ANBandc¢e 0 
=> ae(ANB)~O 

Thus, AN(B~C)C(ANB)~O 
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For the second part, let y «(AN B)~C 
=y € ANBandy ¢ C 
f =>(y € A andy € B)and y€O 
=y € Aand(y € Band y€0) 
= ye A and y ¢(B~C) 
= y 2 AN(B~O) 
Thus, (ANB)~CCAN(B~C) 
AN(B~C)=(ANB)~C 
4. Prove that AN(B~C)=(ANB)~(ANO). 
Let, a ¢ AN(B~O) 
= aeAand 2 z(B~O) 
=> € Aand(e © Band ¢ 0) 
=>(2@ € Aanda € B)anda ¢g 0 
=>(w¢A and ws B) and c &.0' 
=> a2 e(ANB)NC 
=> @ 2 pU(ANB)NO' | 
=> we (gNBU(ANB)NO} 
=e e {AN A)NBIULANB)NOF 
=> 2 e(ANB)NAULANB)NC} 
=> 2 2 (ANB)N(A’UC’) 
=> 2 e(ANB)N(ANO) 
=> ae (ANB)~(ANO) 
te, AN(B~C)C(ANB)~(ANO), 
Again let, y e(ANB)~(ANC) 
=>y & (ANB) and y¢(ANO) 
=> ye (ANB) and ye (ANC)! 
=> y e (ANB) and ye (A’U0’) 
= e{((ANB)NAULANB)NO} t 
=> re (ANA)NBUAUB)NOG} 
=> 7 ¢(SNB)ULANB)NO't 
=> ye dU(ANB)NC} 
=> ye (ANB)NC’ 
=yeAN(BNo) 
~ = ye AN(B~O) 
ie, (ANB)~(ANC)CAN(B~O). 
Thus, AN(B~C)=(4NB)~(ANO). ; = 
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5. Prove that (A~B)U(B~ A)=(AUB)~(ANB) 
Let #e(A~B)U(B~ A)=> a ¢ (A~B) or # ¢ (B~A) 
= (aE A and a ¢ B) or (cE B and «¢ A) 
=a € Aorz € B, but ¢ both A and B 
. =e & AUB, but € ANB 
= «e(AUB)~(ANB) 
Thus (A~B)U(B~ A)C(AUB)~(ANB) 
Similarly, (AUB)~(ANB)C(A~B)U(B~ A) 
“. (A~B)U(B~A)=(AUB)~(ANB) 
Note ; AAB(A Symmetric difference B)= (A~B)U(Be A), 


6. For any two sets A, B, show that 
AUB=(A~B)UB 


Let me AUB=>aeA or re B 
=>ceB or weA 
=> (ce © B or @ € A)end (ee Bore ¢ B) 
(step may be noted) 
=>eEB or (eG Ananda ¢ B) 
=ereB or xe(A~B) 
= ae(A~B) or eB 
= ae(A~B)UB 
i, AUB C (A~B)UB 


Again, lei ye (A~B)UB => ye(A~B) or yveB 
=yeB or ye(A~B) 
=>vEBor (y€ Aandy ¢ B) 
= (veBor yeA)and(yeB or yeB) 
=yeB or ved ° 
=veA or yeB 
=> ye (AUB) 

ie, (A~B)UB C AUB 

!. AUB=(A~B)UB 
7. Prove that AC Bif and only if AUB=B 
Let us suppose ACB, we will show AUB=B 
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LetwesAUB=>aeA or ceB j 
=>a2eB or teB (as ACB) | 
=2r2B 


Again, ye B=>y « AUB 
ie, BOAUB 
“. AUB=B. 


Algebra of Sets 


4 The algebra of sets deals with certain fundamental laws or 
properties, governing operations on sets. These are like fundamental 


ie, AUBCB : : 


’ laws of addition and multiplication in ordinary algebra of numbers. 


Only in certain stages there are differences, which are discussed below : 


(i) Commutative law : 
For real numbers, addition and multiplication are commutative, 
ie, atb=btaandaxb=ba.- 
Union and Intersection of sets are also commutative, 
ie, AUB=BUA and ANB=BNA. 


(ii) Associative law : 
Addition and multiplication of numbers are associative , 
ie, at(b+c)=(at+b)+c, ax(bXc)=(axb) xc. 
Union and intersection of sets are also associative, 
ie, AU(BUC)=(AUB)UG, 
ANM(BNO)=(ANB)NG. 
(iii) Distributive law : 
In algebra of numbers only one law operates. 
ax(b+c)=axXbtaxc. 
a+(bxXc)A (a+b) x (ate) 
In algebra of sets, we have 
AN(BUC)=(ANB)U(ANG) 
(which is same as ordinary algebra shown above) 
and AU(BNC)=(AUB)N(AUO) 
(this additional law holds good only in algebra of sets) 


(iv) Idempotent law : ‘ 


This law also shows that the algebra of sets is not completely 
analogous to that of ordinary algebra of numbers. 
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In algebra of numbers we have, 
a+a=2a and axa=a’*. 
But for the set A 
AUA#=A and ANA=A. 


(vy) Identity law: 

In ordinary algebra 0 is taken as an identity element for addition 
only, ¢.e., a@+O=a. In algebra of sets, union of a null set ¢ with any 
set A is the set A itself, i.e, AUP=A. 

Again, in ordinary algebra, 1 is taken as an identity element for 
multiplication, since aX1=1. In algebra of sets, AMU=A, where 
U is Universal set. 

For such similarities (as shown) AMB is known as logical sum and 
AMB as logical product. 


(vi) Complement law: 
In algebra of numbers if a is a fraction, say #, then its complement 
is $, where 
t+4-1, $x 4-3 (#0) 
In algebra of sets, for every subset A of the universel set U, there 
js one and only one complement of A, i.e., A’ such that AUA’=U, 
AN A'=¢. 


(vii) De Morgon’s law : 
Tt A and B are two sets, then this law states that (A UB) =A'NB’ 
and (ANB) =A’UB'. 


Partition of a Set. 
A universal set U may have» number of disjoint subsets. If 
again these subsets are joined together, 


then the same universal set is formed, as U 
shown in the figure. 
In the disjoint sets, no element is oe ow 


common. If, however, we take in account 

the elements which may be common to 
two or more subsets, then we will find more partitions. In case of three 
subsets of » universal set, there will be eight partitions (i.e, 2°=8) as 


shown by symbols and Venn Diagram side by side. 


Bus. Stat.—26 
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By Symbols : 


ANBNG 

ANBNG' 
ANB'NG 
A'NBNG 
ANB'NO’ 
A'NBNO’ 
A‘NB/NC 
A'NB'nd 


PIS ove co po 


If again we take unions, intersections and complement of the 
three subsets, we get more regions as follows : 


Subsets 
AUBUG 


(AUB)NC 

or (ANC)U(BNC) | 1,3, 4 
A'U(ANGO) 1, 3, 4, 6, 7, 8 
(ANB)UG 1, 2, 8, 4,7 
(AUB)NO’ 2, 5, 6 
A/N(ANG) None 


Number of Elements in a Set. 


In a finite set, if operations are made, some new subsets will be 
formed. In this section we will find the values of these new subsets. 
Since A is a finite set, we shall denote it by n(A) for the finite elements 
in A, which may be obtained by actual 
counting. But for unions of two or 
more sets, we have different formulas. 


(1) For Unron oF Two Sets : 


For two sets A and B which are 
not disjoint, 
n(A UB)=n(A)+n(B)- n(A UB) 
Proof: From the Venn Diagram, we observe that AUB is the 
union of three mutually disjoint sets (A—B), ANB and (B-A). 
n(A UB)=n(A - B)+n(ANB)+ a(B-— A) (1) 
Since A and B are finite sets, let us assume that n(A)=2, n(B)<y 
and x(ANB)=z, then n(A—B)=2—z and a(B- A)=y-z. 
from (1), we get 
AUB) =2-2+2+y-2=2+y—-2=n(A) +n(B)—n(ANB) (a) 
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If AUB=¢ then n(AUB)=7(A)+ x(B), 
ie., if A and B are disjoint sets, then n(AUB)=n(A) + (B) 


(2) For UNION oF THREE SETs : 


Let A, B and C are the three sets (not mutually disjoint) then 
nAUBUC)=n[A U(BUC)] 
=n(A) +n(BUC) —n[AN(BUC)] 
=n(A)+2(B) + n(C) — n(BNC)= n(ANB)U(ANC) 
(distributive law) 
=n(A)+n(B)+ n(C)—n(BNC) 
—[n(ANB)+ x(ANC)-n(ANBNC)I, 
since [(ANB)N(ANG)} =(ANBNC) 
“= n{A) + n(B) + n(0) = n(ANB)— n(BNO)— n(CN A) 
+n(ANBNC). 
Tf again A, Band O are mutually disjoint, then we find 
n(AUBUO)=n(A) + n(B)+n(0), 


Splitting: 
The basic set A, B or O may be split up into a number of inter- 
mediate sub-groups or sets, as follows : 
nA) =n(A NB) + n(A NB’) 
=n(A NBNO)+n(A NBNC’) + n(A NB’ NC)+n(A NB/NC), 
n(A) can also be split up in the form of A, C. 
Now the subsets of the union of three sets will be 
n(A UBUG)=n(A NB'NC)+n(ANBNC’)+2n(BNA' NC’) 
+n(A NB/NO)+n(A NBNC)+ nA’ NBNG) + n(A’ NB! NO) 
[ the residual subset is n(A‘NB’NO’) } 


Re-grouping : 

The subsets of a universal seb may be re-grouped with other 
subset, Re-grouping or splitting (shown before) of subset will depend 
on the nature of the problem. Process of re-grouping is shown below 
(with reference to the above diagram of A, B and C sets) in few cases, 

n(AN OC) =n(A)— (AN 0) 
n(BN A)=n(B)—2(BN A, 
n(ONB) =n(C)—n(C NB) 
n(A NB) =n(ANBNC)+n(ANBNO) 
n{BNC) =n(ANBNC)+n(BNCN A) 
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or n(BNCN A’)=n(BNO)-2(ANBNC) 
n(A NB'NC')=nlA)- n(A NB)-2(A NO) + n(ANBNC) 
n(B NO’ NA’)=n(B) - n(B NC) =n(BNA)+n(ANBNC) 
n(A)=n(A NB) +n(A NB’) 
+ n(C)=n(BNC)+n(CNB’). 
Worked out Examples : 


1. In a class of 100 students, 45 students read Physics, 5% 
students read Chemistry and 17 students read both the subjects. Find 
the number of students who study neither Physics nor Chemistry. 


We know n(AUB)=n(A)+n(B)—n(ANB). 

Here n(A)=45, n(B)=52, n(ANB)=17 

80, nA UB)=45+52-17=97-17=80 
We are to find n(A’N B’), which is 100-80=20. 

2. In a class of 50 students, 15 read Physics, 20 read Chemistry 
and 20 read Mathematics. 8 read Physics and Chemistry, 6 read 
Ohemistry and Mathematics, and 5 read Physics and Mathomatics. 
7 read none of the three subjects. How many students read all the 
three subjects. 

Here, n(AUBUC)=50, n(A)=15, n(B)=20, n(C)=20 

nANB)=3, n(BAC)=6, ON .A)=5, xXANBNC)=? 
“Using the formula, 
n(A UBUO) =n(A) + n(B) + n(6)— n(ANB)- n(BNO) 
—n(ON.A)+n(ANBNO) 
or, 50=15+20+20-38-6-5+n(ANBNOC) 
or, n(ANBNC)=9. But 7 students. read nothing. 
required no. of students =9-7=2. 
(By re-grouping of sets) 

8. An enquiry into 1000 candidates who failed in C.A. final 
examination revealed the following data :— 

658 failed in aggregate, 166 failed in Aggr, and Group I 

372 ,, 4, Group I, 434 ,, ule vest OOUD LL: 

690 ,, »  » 1,186 ,, 5, ,, both Groups 


Find how many candidates failed in : 
(a) All the three, 
(6) In aggregate but not in Group I, 
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(c) Group I but not in aggregate, 
(d) Group II but not in aggregate, 
(ce) Aggregate but not in Group I and Group II. 


Let A, B and C are the sets of students who have failed in 
aggregate, group I and group II respectively. 


(a) n(AUBUC)=2(A) +2(B) + n(C)- n(ANB)— n(BNC) 
-—n(ANO)+n(ANBNC) 
=> 1000 =658 + 372 + 590 — 166 — 434 — 126 
+n(ANBNC) 
or, m™ANBNO)=106. 


(b) Failed in Aggr. but not in Gr. I=n(ANC’) 
nA NO!) =n(A)—n(ANO) 
fas ANO’ and ANO are disjoined sets] 
= 658 — 434 =294. 


(c) Failed in Gr. I but not in Aggr.=n(BN A’) 
n{(BNA’)=n(B)— n(BN A) 
[as BNA’ and BNA are disjoined sets] 
=872—-166=206. 


(a) Failed in Gr. IT but not in Aggr. = n(ONB’) 
n(ONB') =n(C)—n(CNB) 
=590-126=464. 


(e) Failed in Ager. but not in Gr. I and Gr. II 
n(ANB'NO!)=n(A) = n(ANB) = n(ANC) 
+n(ANBNO) 
=658 - 166 - 484+106 --- from (a) 
= 164. 


UsE oF VENN DIAGRAM : 


4. In a survey of 150 students, it was found that 40 students 
studied Physics, 60 students studied Chemistry and 50 students 
studied Mathematics, and 15 students studied all the three subjects, 
27 students studied Physics and Chemistry, 35 students studied 
Chemistry and Mathematics and 25 students Physics and Mathematics. 
Find the number who studied only Physics and the number who 
studied none of these subjects, 
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Let P, C and M represent Physics, Chemistry and Mathematics 
respectively, 
Now, PNCOM=15 
PNG =27 P(40) C(60) 
COM =85 


rea a ASN 
Students studying Physics only 
=40-(12+15+10)=3 


(see the diagram) 


Number of students who studied one or more subjects 
=15+12+10+20+8+138+5=78 

.", number of students who studied none 
= 160-78 =72, 


5. Out of a certain set of 200 students, 40 read German, 76 read 
French and 82 read Spanish, 36 read exactly two of these languages, 
but none read all the three. 84 read German but not Spanish and 
10 read both German and French. Find how many of the original set 


fail to read any of the three languages, and how many students read 
German only. 


Total number of students reading’one or more of the subjects 
= 40+ 76+ 82=198, 


Here, there is no student reading 
all the three languages together, so there 
will be no common region between the 
three circles, which means three circles 
will meet at a point. 


Now 36 students read exactly two \ 
subjects, i.e, German and French, French (a2) 
and Spanish, German and Spanish. 


G(40) F(76) 


Od 


Again 10 students read German and French. 

So, #1: +7,=86-10=26, e 

Again 84 students read German but not Spanish. 

So, 7, =40-34=6, “. @g=26-6=20. 

Number of students read German only = 40- 10-6 =24. 


Total number of students reading German only, French only, 
Spanish only = 198 - 86 = 162. 


students fail to read either of the three languages 
= 200 - 162 = 38, 


* 


a 
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Cartesian Product. 


Let A and B be two given sets. Ifa A, b ¢ B then (a, b) denotes 
an ordered pair, a is regarded as the first element and b the second 
br so that (a, 5) is not the same as (b,a). In case of a set {a, df 
={b, a}. 


Two ordered pairs (a,b) and (c,d) will be equal if and only 
if a=o and b=d, 
t.6., (a, b)=(0, d) => amc, b=d. 


Definition : 


If A and B are two sets, then the set of all ordered pairs (a, b) 
such that ae A, be B is called the cartesian product of A and B and is 
denoted by Ax B. 

In symbols : 

AxBe{a:a=(a, bac A, be B 


EXAMPLE : 
A={1, 2, 8}, B=fl, 2} 
Ax B={(1, 1), (1, 2), (2, 1), (2, 2), (8, 1), (3, 2) 


EXAMPLE : 
It A={1, 2, 8}, B={2, 3} 
Prove that, AX BA Bx A. (0. A. Entr., May '74 ] 
Ax B={(1, 2)(1, 8), (2, 2)(2, 8) (8, 2X8, 3)} 
Bx A={(2, 12, 2)(2, 8)(8, 1)(8, 2)(8, 3)}. 
We find the elements (1, 2)(1, 8) of Ax B, are not the elements 
of Bx A, 
AxBHBXA. 


Properties : 


1, Since two ordered pairs (a, 6), (b, a) are unequal the cartesian 
product is not commutative 
i., AX BABA unless A=B or one set is empty. 


9. If set A has m elements, and set B has n elements then set 
AB has mn elements. 


3. AxB is empty if either A or B is empty. 


4, AB is infinite if either A or Bis infinite and the other is 
non-empty. ‘ 


408 BUSINESS STATISTICS 


Cartesian Product of n Sets: 
Tet Ai, Ag -***: An bo m sets. The set of ordered n-tuples 


(Oa, Gay ++ Gn) where aj = Ai, i=1, 2, ++++* m is known as cartesian 
product of Ay, Ae, ****"* , An and is denoted by Aix Ag X-*-"* x An 
EXAMPLE : 


If A={2, 3}, B={1, 3}, C={3, 4}. 5 
Find (i) “Ax(BNC) (ii) Ax(BNC) (iii) (AxB)U(Bx 0). 


(i) Ax(BUO)={2, 3}x{1, 3} U {3, 4} 
» =42, 3} x {1, 3, 4 
={(2, 1), (2, 8), (2, 4), (8, 1), (3, 3), (3, 4)}. 
(ii) A x (BAC) ={2, 3} x f1, 83943, 4} 
= {2, 3} x {38}={(2, 3)(3, 3)}. 
(iii) Ax B={(2, 1)(2, 8), (3, 1)(3, 3)} 
Bx O=({(1, 3)(1, 4), (3, 3)(3, 4) 
*. (AXB)U(B x 0) ={(2, 1/(2, 383, (18), 3)(2, 3)(2, 4)(3, 4)}. 
Useful Results. 


1. Prove that (i) ACBand CCD = (AxO)C(BxD) 
(ii) A x(BUO)=(A x B)U(A x0) 
{ 0. A. Bntr., Nov. 74] 
(iii) A x(BNC)=(Ax B)N(A xO) 
(iv) (AxB)N(S x T)=(ANS8)x (BNT) 


(i) Let (a, c) be any element in (A x C). 
then (a, c) e(AxC) = aeAandceO 
= aeBandceD,as ACB, CCD 
=> (a, c) e (Bx D) 
*. (AxC)C(Bx D), 
(ii) Let (@, y) e Ax (BUO) 
= eA and y « (BUC) 
=> @e A and (y ¢ Boor yz C) 
= (we Aand ye B) or (ee Aandy ¢ 0) 
= (a, v) €(AXB) or (a, y) e(AxC) 
=> (a, y) (Ax B)U(A x G) 
Ax(BUC)E(Ax B)U(A x), 
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Again, let (u, v) ¢ (A x B)U(A xC) 
=> (u, v) e (AB) or (u,v) e (Ax C) 
=> (we AandveB) or (we A and v¢ 0) 
= ueAand(veB or veC) 
=> (u, v) e Ax (BUC) 
(Ax B)U(A x 0)CA x(BUC) 
Ax(BUC)=(AxB)U(A x0) 


(iii) Similar to above, and it is left to the students. 
(iv) Let (a, v) e (A x B)A(Sx T) 
= (c, y) e (AB) and (2, y) « (Sx T) 
=> (ve A andy ¢ B) and (e <8 and ye T) 
=> (¢ eA and a « §) and (y ¢ Band y « T) 
=> oe (ANS) and y ¢ (BNT) 
=> (a, y) e (ANS) x(BNT) 
(Ax B)A(S x T)C(ANS) x (BNT) 
Again, let (u,v) ¢ (AMS) x (BOT) 
=> ue(ANS§) and v e (BNT) 
= (we Aand we $) and (v ¢ Band v « T) 
=> (u, v) e (AX B) and (w, v) ¢ (Sx T) 
=> (u, v) ¢ (AX B)N(S*T) 
(AN §)x(BAT)C(A* B)N(S xT) 
(Ax B)N (8 * T)=(ANS) x (BN T). 


2. It AC B, show (AXA) = (Ax B)N(Bx A) 
Let (a, y) ¢ (Ax A) 

thena eAandyeA. Nowas ACB by hypothesis 
ceA=>aeBondyeA>yeB 


(a, y)e(AxA)=>aeAandyeA 
=>ceAandysB 
=(c, y) ¢ (AB). 


Again, (2, v) @(AxA)—>aeAandyeA 
=areBandyeA 
= (a, y) (Bx A). 
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Thus (2, y) ¢ (Ax A) => (#, y) 2 (AxB) also e (Bx A) 
= (ce, y) «(Ax B)N(Bx A) 
(Ax A) => (Ax B)M(Bx A) 


8. For any three sets A, B and O, show that 
Ax(B~C)=(AxB)~(Ax 0) 
Let (x, y)e AX(B~O) > ae Aandys (B~O) 
=> axeAand (y © Bandy ¢ 0) 
= (ze A and y2 B) and (e € A andy ¢ O) 
= (2, y) © (A xB) but (2, y) ¢ (Ax) 
= (a, y) & (AxB)~(Ax C0) 
Ax (B~O)C(A x B)~(Ax 0). 
Again, let (u, v) «(A x B)~(A x0) 
= (u, v) © (Ax B) but (wu, v) ¢ (Ax 0) 
= (we A and ve B) and (u G A and v ¢ 0) 
= ue Aand (v € Band v ¢ CO) 
= ue Aand ve (B~O) 
= (u, u) e Ax(B~C) 
(Ax B)~(A x 0) A x(B~O) 
Thus, A x (B~C)=(A x B)~(A x0). 


EXERCISE 18 
1. Given A={1, 2, 3, 4,5,} B={2, 4, 6} 
O={3} D={0, 1, 2,-++++-9}. 
Find (i) AUG, (ii) AU(BUOC), (iii) BNO, (iv) OnD, 
(v) AN (BNC), (vi) (ANB) NC, (vii) AAB. 
2. Given U (universal) ={0, 1,---+--9), A={9, 4, 6}, 
B={1, 3, 5, 7),- C={6, 7} 
Find (i) ANB, (ii) (A UB)~C, 
(iii) (AUC), (iy) (AN U)N (BUCO). 
8. If 8 be the set of all prime numbers, M={o, 1, 2,---9}, 
exhibit (i) SMM, (ij) M-(SNM). 
4, Let A=fa,b,d, B={d}, Oe=fo, dj, 
D={a, b, dh, H={a, bf. 
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Determine if the following statements are true ? 
(i) ECA, (ii) BCO, (iii) ACD, (iv) OCD, 
(v) B=G, (vi) BDC, (vii) A~D. 
5. Determine which of the following sets are same 
A={5, 7, 6}, B=46, 8, 7}, C={5, 6, 7} 
D =e : # is an integer greater than 2 but less than 6} 
H={1, 3, 3, 4, 5, 6} F={3, 4, 5}. 
6. Hill up the blanks by appropriate symbol 
(el Sema ch) 
(i) 8 preter {8, 4$U{4, 5, 6f 
(ii) {6}++++ {5, 646, 7, 8} 
(iii) {8, 4, B}e++-+-42, 8, 4FU4B, 4, 5} 
(iv) fa, Bhs fa} 
(v) 4:48, SUIS, 6, 7} 
(iv) 41, 2, 2, B}+r7--48, 2, 1}. 
Find the power set P (A) of the set A={a, 6, ch. 
Indicate which of the sets is a null set ? 
X=fa: a2 =4, 82=12} 
Y=ele:ct+7=7 
Z={e: Act. 
9, IfU={a:a is a letter in English alphabet} 
V={a: « is a vowel} 
W<= jz : # is a consonant} 
Y={z: x ise or any letter before e in the alphabet} 
% =e : wis e or any one of the next four letters} 
Find each of the following sets : 
(i) UNV, (ii) VOY, (iii) YNZ, 
(iv) DAW’, (vy) UA(WNY). 

10. Given A={e: a N and z is divisible by 3} 
B={r: aN and a is divisible by 3} 
O={c:2¢N and a is divisible by 4} 

Describe AN(BNC). 

11.658 A={c:a22Nand 2 < 6} 
B={e:ceNand3<2r< 8} 
U=je:ceN and «> 10} 
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Find the elements of the following sets with any remark of any > 
(i) (AUBY, (ii) A’NB’, (iii) (ANB), (iv) AUB. 
12, If § be any set, P (8) its power set and if A and B belong te 
P(8), then show that BN(A~B)=¢. 
18. For any sets A and B, show that 
(i) A~B=A~(ANB) 
(ii) A’~B!=B~A, 
14. For any three sets A, B and O, show that 
AU(B~C)#(AUB)~(AUO) 
15. Let A=(a, b), B=(d, c), O=(d, e). 
Find (i) A x(BUO) 
(ii) (ANB)xG 
(iii) (A x B)N (A xO) 
16. IfA={1, 4}, B={2, 3}, O={3, 5} 
Prove that Ax BABx A [0. A. Bntr., May 75] 
Also find (A xB)N(A xO), 
17, Tf A={1, 2, 3}, B={2, 3, 4}, S={1, 3, 4}, T={9, 4, 5}, 
verify that (Ax B)N(Sx T)=(ANS)x (BOT) 
[ O. A. Bntr., Noy. 76] 


. 18. In a class of 30 students, 15 students have taken English, 
~ 10 students have taken Hoglish but not French. Find the number of 
_ students who have taken (i) French, and (ii) French but not English. 

(Ans.: 20, 15) 


19. In a survey of 320 persons, number of persons taking tea . 
is 210, taking milk 100 and coffee is 70. Number of persons who 
take tea and milk is 50, milk and coffee is 30 and tea and coffee is 50. 
The number of persons taking all the three together is 20. Find 
the number of persons who take neither tea, nor milk nor coffee. 


* (Ans. : 50) 
20, Out of 440 boys in a College, 112 boys read German, 120 read 
' French and 168 Spanish. Of these 39 read French and Spanish, 40 read 


German and Spanish, 20 read German and French, while 12 read all 
the three languages. How many boys 


(i) did not read any language, 
(ii) read just one language ? (Ans.: 320, 252 ) 
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21. In a survey of 100 students, the number of students studying 
various languages is as follows: 


German only 18, German but not Spanish 23, German and 
French 8, German 26, French 48, French and Spanish 8, no 
language 24. 


Find (i) how many students took Spanish ? : 


(ii) how many took German and Spanish but not French ? 
Ans.: 18,0) 


22. A reporter supplied the following data about another set of 
100 boys— 


All three languages 5, German and Spanish 10, French and 
Spanish 8, German and French 20, Spanish 30, German 23, French 50. 


The reporter was dismissed, why? (Ans,: Negative number he 


23. In a survey concerning the smoking habits of consumers 
it was found that 55% smoke cigarette A, 50% smoke B, 42% smoke O,. 
98% smoke A and B, 20% smoke A and O, 12% smoke B and O and 
10% smoke all the three cigarettes, 
(i) What % age do not smoke ? 


(ii), What % age smoke exactly 2 brands of cigarettes ? 4 
Ans.: 8%, 80% ) 


94. A company studies the product preferences of 10,000 
consumers. Ib was found each of the products A, B, O was liked 
by 5000, 3470 and 4830 respectively and all the products were liked 
by 500; products A and B were liked by 1000, products B and O were 
liked by $00 and products C and A were liked by 1400. Prove that the 
study results are not correct. It was found that an error was made in 
recording the number of consumers liking the products B and O. 
What is the value of this number ? (Ans. : 1400 ). 


25. In asurvey of 1000 boys playing outdoor games of foothall, 
cricket and hockey were recorded. Hach boy plays any one of the 
games, 400 boys did not play hockey, 870 did not play cricket and 550 
did not touch football, 800 played football and cricket, 270 played 
both cricket and hockey, 200 both hockey and football. How many 


boys played all the three games ? How many played only football ? 
(Ans,; 90) 


96. In a close quarter battle 60% of combatants lost an eye, 
75% lost an arm and 80% lost a leg. If any of the combatants has at 
least one of these types of loss and the percentage of combatants having 
loss of exclusively any two types at the same time is\65. Calculate 


the percentage of combatants haying the three types of loss at the 
(Ans.: 25%) 


same time. 
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Introduction and Meaning 


Tn nature or in our day-to-day life, situations arise in which we 
ean not predict with absolute certainty about the exact occurrence of 
any future event. We may hope, speculate or guess with or without 
reason about the happening of an event only. Now the likelihood of 
the occurrence is expressed by the term Probability. And a distinct 
branch of Mathematics has gradually developed since the last sixteenth 
ceutury formulating different theories on probabilities. 


Initially the applications of probability theories were restricted 
to games of chances. But in course of time they are being incorpo- 
rated in the business processes and decision making apparatus by 


business firms, governments, and professional and non-profit organiza- 
tions. ‘ 


Predictions on demand for a new product, estimations of 
production costs, forecasting crop failures, buying insurance, pre- 
peration of a budget, eto.—all are better enumerated with the help 
of the Mathematics of probability, as they haye some element of chance 
inherent in them. And the advantage of probability theory lies in the 
fact that it has the ability to quantify “how likely” some event is. We 
have three different ways of calculating or cumputing probabilities. 
There are two objective methods and one subjective method. The 


objective ones relate to the classical Spproach and the empirical 
approach respectively. 


_ The classical approach, which forms the subject-matter of this 
chapter, is generally made when the given situations have equally likely 
outcomes. Games of chance, which often involve coin-tossing, rolling 


dice, or drawing cards, usually have this characteristic of equally likely 
outcomes, 


The empirical approach is based on the relative frequency of 
occurrence of an event over a large number of repeated trials. When 
this approach is made, the following important points should be noted : 


1. The probability so determined is only an estimate of the 
actual value. : 


2. The larger the number of trials, the better the estimate of the 
probability. 
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8. The trials should be conducted under identical conditions. 


The subjective method is based on an individual’s personal feeling 
such as the judgement that ‘there is 90% chance that it will be rain 
tomorrow’ or that ‘there isa better than 50% chance that a labour 
strike will be settled this time’. 


We will not, however, make use of this method in this text. 


-Random Experiment 


A random experiment is an experiment whose all possible results 
(outcomes) are known and which can be repeated under identical 
conditions but it is not possible to predict the outcome of any 
particular trial in advance. 

Note. Any particular performance of a random experiment is called trial. 

Tossing of a coin is a noteworthy example of & random 
experiment, the outcomes are two in number a head or a tail appears 
but exact prediction is not possible in any tossing. 

Similarly, throwing of a die is also a random experiment with six 
outcomes—either 1, 2; 8, 4, 5, or 6 will turn up, but here as before, 
exact prediction is impossible in any throwing. 

Any process of observation in business, as for example, the 
production of a commodity on different days or price of a commodity 
in different months may be taken as outcomes of a random experiment. 


Events 
The possible outcomes of a random experiment are called events, 


Elementary and Compound (Composite) events : 


Elementary (simple) event is an outcome of an experiment that 
cannot be decomposed further, whereas Compound (composite) event 
is an aggregate of some elementary or simple events and is decomposable 
into simple events. 

In the experiment of tossing a coin, one simple event is ‘head’ 
and the other is ‘tail’ but the event ‘head or tail’ is a compound 
(composite) event since it can be decomposed into two simple events the 
event ‘head’ and the event ‘tail’. In tossing two coins, the event ‘both 
heads’ (HH) is a simple event and so also the event ‘both tails’ (TT); 
but the event ‘one head and one tail’ is a compound event consisting of 
two events (HT) and (TH). [ H is for head and T for tail. ] 

In throwing two dice the event ‘total 12 points’, viz. (6, 6) isa 
simple event but the event ‘total 7 points’ is a Compound (composite) 
Pek consisting of 6 simple events (1, 6), (2, 5), (3, 4), (4, 8), (6, 2) and 
6, 1). i 
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Mutually exclusive events : 


Two events are said to be mutually exclusive or incompatible 
when the oceurrence of one of them excludes the occurrence of the 
other or in other words, two events are mutually exclusive if both 
cannot occur simultaneously. % 

Ifa single coin is tossed, the ‘head’ and the ‘tail’ cannot occur in 
the same trial, Hence the event ‘head’ and the event ‘tail’ are mutually 
exclusive. If two coins are tossed, the events (HH), (HT), (TH) and fi 
(LT) are mutually exclusive. | 

In drawing 2 single card from a pack of well-shuffled 52 cards 
the events ‘card is a spade’ and ‘card is a club’ are mutually exclusive, 
because a card cannot both be a spade and a club. But the events 
‘card isa spade’ and ‘card is a face card’ are not mutually exclusive, — 
since some spade cards are face cards. 


‘Exhaustive events: 


Events are said to be ewhaustive if at least one of them must 
necessarily occur. 

The total number of all possible outcomes of a random experi- 
ment will constitute an exhaustive set of events. 

i Thus, in tossing of a coin there are two exhaustive events ‘head’ ~ 
and tail’ and in throwing of a die the exhaustive events are six—either 
1, 2,3, 4, 5 or 6. In drawing a single card from a pack of 52 cards the 
events ‘card is red’ and ‘card is black’ are collectively exhaustive. 


Equalty likely events : 


| Hyents are said to be equally likely if after taking into account all 
relevant evidences, no one of the events can be expected to occur in 
preference to the other events, that is, when one does not occur more © 
often than the other. ‘ 
In tossing of a coin, ‘head’ and ‘tail’ are equally likely events. 
All the six faces of a die are equally likely events, when it is thrown. 
All the 52 cards of an well-shufiled pack of cards are equally likely 
events when one card is drawn. 


Certain and Impossible events : 


- An eyent is called certain or sure when all the possible outcomes 
of an experiment are favourable to the event, whereas an event is 
called impossible when none of the outcomes is favourable to the event. 


Classical definition of Probability. 


*This concept of probability happens to be the most primitive one 
and depends upon the notion of equally likely events. If fora random 
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experiment there is » (finite) mutually exclusive, exhaustive and 
equally likely outcomes and r of them are favourable to an event A, 
then the probability of the event’ A is defined and denoted by 


PlA)=—- 


Thus, when the possible outcomes are equally ae then the 
ratio of the number of ways an event A can occur to the total number 
of possible outcomes is the probability of the event A 

Now, by definition, P (jez must always lie between O and 1, since 
O<r<n When r=n, 4.c., when event is certain and P (A)=1 and when r=0, 1.6, 
when the event is impossible and P(A)=0. So,0 < P(A) <1. 


For the probability of the event ‘the sun will not rise uernertow 
is zero. The event is an impossible event. And, the probability of the 
events ‘man will die some day’ is 1. This is a certain event. 


Remark. If the event not-A is Lite Hed A, te trom the 
definition of probability it follows that, P(A) ="—" =1- 7 5 P(A). 


Odds. 
The ratio of probabilities for A and for A is often called odds in 
favour of the event A. f 


Odds in favour of the event A= 2h) mera - ay 


And, the ratio of probabilities for A and for A is called odds 
against the event A. 


. Odds against the event A= Pia) = marin in=eln oe 


EXAMPLE : 
In a single toss of a fair coin, find the probability of getting 
‘head’. 


SOLUTION : 


There are two exhaustive, mutually exclusive and equally likely 
outcomes of the loss of a fair coin. Out of these two outcomes, only 
one is favourable to the event head’. 


Hence, probability of getting head “4. 
Bus. Stat.—27 
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EXAMPLE : 


Ina single toss of two fair coins, find the probability of obtain- 
ing (i) both heads, (ii) one head and one tail, and (iii) at least one tail. 


SonurIoN : 


There are four exhaustive, mutually exclusive and equally likely 
outcomes in a single toss of two fair coins, and they are (HH), (HT), 
(TH) and (T'T) where H denotes ‘head’ and T denotes ‘tail’, of which 
only one outcome, viz. (HH) is favourable to the event ‘both heads’, two 
outcomes, viz. (HT) and (TH) to the event ‘one head and one tail’ and 
three outcomes (HT), (TH) and (TT) to the event ‘at least one tail’. 

Hence, 

(i) P (both heads)=+ 
(ii) _P (one head and one tail)=?=4 
(iii) P (at least one tail) =2 


XAMPLE : 


What is the probability of obtaining (i) an even number, (ii) a 
cage less than 5, and (iii) ‘a five’ in a single throw of an unbiased 
ie. 


SoLution : 


There are siz mutually exclusive, exhaustive and equally likely 
Outcomes, viz the appearance of the numbers 1, 9, 3, 4, 6 or 6. 
these outcomes three are favourable to the event ‘an even number’, viz. 
the appearance of 2, 4 or 6, four are fayourable to the event ‘number 
a pe 5', viz. 1, 2, 3 or 4 and one is favourable to the event ‘a five’, 
viz. 5. 


Hence, 
(i) P (even number) =$=4 
(ii) P (less than 5) =4=% 
(iii) P (a five) =t. 


EXAMPLE : 


In a throw of two unbiased dice find the probability of throwing 
(i) total seven points, and (ii) total eight points, 


Sonvtion : 


In a throw of one unbiased die there are siz exhaustive, mutually 
exclusive, and equally likely outcomes. So, when two dice are thrown, 
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there are 6 x 6 =36 exhaustive, mutually exclusive and equally likely 
outcomes. Of these 36 outcomes only in siz cases, viz. (1, 6), (2, 5), (8; 4), 


(4, 3), (5, 2), (6, 1) are fayourable to the event ‘total seven points’. 
Hence, 


P (total seven points) = 3% =4 
For the ‘total eight points’ the favourable cases are five, viz. 
(2, 6), (8, 5), (4, 4), (5; 8), (6, 2), Hence, 
P (total eight points) = fy. 


EXAMPLE: 

What is the probability of getting 3 white balls in a draw of 
3 balls from a box containing 5 white and 4 black balls? (C.A. 1976) 
Sonvrion : 


Total number of balls in the box=5+4=9. 


9x8x7 


8 balls can be drawn from 9 balls in °O, aren 


=84 ways. 


.. Total number of possible cases =84. 
Total number of white balls in the box=5, 
3 white balls can be drawn from 5 white balls in 


5x4x8 
3x2x1 


Number of cages favourable to the event of getting 8 white 
balls = 10. 


50,= =10 ways 


§ 10_ 5 
«. Probability of drawing 3 white balls= a4 rom 


EXAMPLE ; 

A card is drawn a6 random from an well-shuffled pack of 52 
Cards. Find the probability and the odds that the card is a face-card 
(i.e, king, queen or jack). 


Sonurion : 


Total number of cards is 52. One can be drawn from 52 cards 
in 62 ways. So the total number of cutecmes is 52. There are 
8 face-cards in each of 4 suits—a total of 12 face-cards, and one face-card 
from 12 face-cards can be drawn in 12 ways. Since the cards are 
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well-shuffied and one card is drawn at random, each of the 53 cards is 
assumed equally likely to appear. Hence, 
12_'8 
P (face-card) = 597713 
t 12 12 3 
Odds in favour of a face-card = 52-127 40 710 


52-12 40 10. 
and, odds against a face-card = “79 197 3 


Limitations of Classical Definition : 


(1) This definition is applicable when each outcome is equally likely; 
Being based on the idea of equally likely outcomes which means equally probable 
outcomes, the definition involves circular reasoning as the idea of probability has 
been used as a part of the definition of probability. This makes the definition 
unsatisfactory. : 

(2) It is not directly applicable when the total number of possible outcomes 
is infinite and also when it is not Possible to enumerate’ all the possible outcomes, 

(3) This definition is not applicable when the outcomes are not equally likely. 

To remove these difficulties a second definition has been suggested as 
follows : 


Statistical Definition : (Empirical Approach) 
Tf an event A is found to occur r times when a random experiment is repeated 
n times, then r is called the frequency and z is called the relative frequency of A. 


The limiting value of this relative frequency ~ when m increases indefinitely is 
regarded as the probability of A connected with the experiment. Mathematically, 
P(A)=lim 7. 
noo 

Thus the probability of an event A is the limit of the relative frequency of A 
in an infinite sequence of trials, 

The exact determination of the probability of any event is not practically 
Possible here since we are not sure that a limit to the relative frequency “existe. 
On, the other hand, we may use the relative frequency as an estimate of the 
probability of the event occurring under identical conditions, Hence, we. have 
the following definition of the Probability of the event A, 

P( [ay = Namber of times A ocourred 
Total number of trials 

The following points are to be recognised when statistical definition is used: 

{i), Probability determined here is only an estimate of the actual value. 

(ii) The better the estimate of the Probability, the larger the numbor of 

LR~ trials, and , t 25 
-28 @(iii) The trials should be conducted under identical conditions, 
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Set Theoretic Approach. : 

To understand the probability theory and its potential for practi- 
cal application, it is helpful to understand the basic principles of 
Set Theory. 


Set 


A set is a collection of items or objects haying some common 
characteristic or characteristics. 


Sample Space: 

The set of all possible outcomes of an experiment is called the 
sample space of the experiment and is denoted by 8. Each outcome of 
the experiment is called an element or a sample point of the sample 
space, ‘The sample space is also called Universal Set or Hvent Space or 


Possibility Set. 

A Sample Point is also called an Event Point, 

Any. outcome of the experiment corresponds to exactly one 
element in the sample space of the experiment. 

A sample space may be finite or infinite according as it contains 
a finite or infinite number of sample points. 


EXAMPLES : ; 
(1) A random expériment of throwing a balanced coin has two 
outcomes Head and Tail. So, the sample space associated with this 
experiment consists of two sample points and may be expressed as: 
S={H, Tt; where H denotes head and T denotes tail of the coin. 
(2) Sample space associated with the experiment of tossing a 
balanced coin twice or thrice are respectively, 
$,={HH, HT, TH,TT} . 
S.={HHH, HAT, HTH, THH, HTT, THT, TTA, TTT} 
81 consists of four and Se consists of eight sample points. 
(3) Sample: space associated with the experiment of throwing 
one dice consists of siz sample points and is noted as : 
S=f1, 2, 3, 4, 5, 6}. 
(4) Sample space associated with the experiment of throwing 
two dice consists of 36 sample points and is expressed as : 
S={(1, 1); (1, 2); (4, 8)5 G 4); (6) 5 6); (2, 2) 5 (2, 2); 
(2, 8) ; (2, 4); (2 5) (2 6); (8, 153, 2); (8, 8)5 (8, 4) 5 
(3, 5); (8, 6); (4 1); (4, 2)s (4-8) 54, 4) 5 (4,5) 5 (4, 6) 5 
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(5, 1); (5, 2); (5, 8); (6, 4); (5, 5)5 (6,6) (6) 1) (8,9); 
(6, 3); (6, 4); (6, 5); (6, 6)t 
(5) Sample space associated with the experiment of tossing a ; 
coin 10 times and recording the number of heads obtained is expressed ; 
ag: 


S={0, 1, 2, ++ , 10}. 


(6) Sample space associated with the experiment of tossing a 
coin until a head appears for the first time and recording the number 
of tosses is ; | 

S={1, 9; 3: see eee, eo}, 


Event, 


A set of outcomes of an experiment is called event. Thus, an 
event is a subset of the sample space of an experiment. "i 


Every set of sample points in a sample space is an event. 


As the empty set ¢ and the universal set § are the subsets of 8, 
¢ and 8 are also events. The event ¢ is called impossible event and the 
event § is called swre or. certain event. 


Elementary Event : 


An event A, in the sample space 8, consisting exactly one sample 
poiné of § is called an elementary or simple event. 


Thus an elementary event is an outcome of an experiment that 
cannot be decomposed into a combination ‘of other eleméntary events 
of the sample space. 


Compound Event : 


An event is compound or composite if it can be decomposed into 
elementary events: 


In an experiment of tossing two unbiased coins there are four 
elementary events, i.e, HH, HT, TH and TT but the event ‘one head 
and one tail’ is a composite event consisting of two elementary. events 
HAT and TH. 


Union. 


. The union of two events A and B is the set of all sample points 
belonging to A or B (or both) and is denoted by AUB, 


Symbolically, AUB={2:2EA or eB}. 


PROBABILITY 423 


Intersection, 


The intersection of two events A and B is the set of all sample 
points common to both A and B and is denoted by ANB. 


Symbolically, ANB={a:a2EA and eE&Bh. 


Mutually Exclusive Events. 


Events are mutually exclusive or disjoint when they have no 
points or elements in common. i 


Thus, if Ai, Ag,....-- , An be » mutually exclusive events, then 
AiN AsNAsN >>" NAn=¢. 
So, for two disjoint events A and B, we have ANB=¢. 


Complement. 


An event A is said to be the complement of an event A if A 
consists of all sample points in the sample space § that are not points 
of the event A. 


Symbolically, A={a:2€S, «¢ A} 
The complement of an event A is also denoted by A’ or A°. 


The number of all possible sample points or elementary events in 
the sample space of an experiment is denoted by n{S) and the number of 
sample points or elementary events inany event A is denoted by n(A). 

Sample space of Equally Likely Outcomes : Definition of Probability 
(classical definition) : 

Moat often, the very physical nature of the experiment suggests 
that the different possible outcomes of the experiment are considered 
to be equally likely... In this. case, yarious outcomes are assigned equal 


probabilities. Such a sample space, in which each and every sample 
point has the same. probability, is called equi-probable sample space 


or sample space of equally likely outcomes. 


(Exampie: Experiment of throwing a die consists of 6 sample 
points in the sample space. Hach of the sample points ‘face 1’, 
‘face Q’,---+-+ “face 6’ has the probability 4. ) 

Thus, if ina sinite sample space of equally likely outcomes there 
are n sample points, the probability associated with each sample point 
would be 1/n, as the sum of the’ probabilities of all the sample Points in 
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the given sample space must equal 1. Hence, for any event Ain the 
sample space S, consisting of m sample points, we have, 


L hata 
P(A) =, + = +--+ to m terms 


n 
— Number of sample points in A Ce 
Number of sample points in § 
_ Number of elementary events in A 
Number of elementary events in § 
_ nd) 
n{8)./ 


Properties of P(A). 
() P (a= 28) must always lie between 0 and 1, sinca 0 < n (A) < n (8). 
(ii) P (A)=1 when n (A)=n (8), i.¢., whon the event is certain, 
(tii) P (A)=0 when n (A)=0, i.e., when the event is impossible, 


Note 1. To choose an object at random from n objects means that each, 
object has the same probability 3 of being chosen, 


Note 2. To choose & objectszat random from ” objects (% < n) means that A 
each set of k objects (disregarding order) has the same probability of being chosen as a 
any other set of K objects. : 


Rules of Probability 


I. Rule of Addition : 


THEOREM 1: Theorem of Total Probability—It Az, Ay...» Am 
be m, mutually exclusive events, then 


P(A, UAg U+*+#+ Ag) P(A,) + P(A) + Ga: +P(Am) 


te. the probability of Ay or Ag or **-+-or Amis the sum of the 
ee of these events, provided the events are mutually 
exclusive. 


Proor:: 


Let a finite sample space § of equally likely outcomes of a random 
experiment consist of n(S) sample points of which n(A,) sample points 
correspond to event A, and n(As) sample points correspond to event: 
As. Since the events Az and Ag are mutually exclusive the number of; 
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sample points that correspond to either Ai or Ag is then (As) +n(As). 
Hence, by the definition of probability, 


P(A, UA,)= Maal ends) 


= As) , nls) 
nS) _n(8) 
=P(A1)+ P(As) e0i{ Tl) 
Repeated application of (1) leads to 
P(A, UAgUAg Us UAmes UAm) 
=P([Ar UAgUAg U*++** UAm—3] UAm) 
=P(Ay UAgUA,U Rane th: UAm-3)+ P(Am) 


= P(A, VAs UAg Ue U Ame») + P(Am_a)+ PlAm) 
= P(A) + P(As)+ rs oe +P(Am) 
when Aj, Ag,......,Am are mutually exclusive events. 


Cor. 1. If Ax, Aa, **777,Am are mutually exclusive events and 
A=A,UAgU-:++* UAm, then P(A)= P(A; UAs U--*+- UAm) 
= P(Ax) + P(Ag) +++ +P(Am). 

Cor. 2. When the events Ai, As, **"*" , Amare mutually exclusive 
and also exhaustive, then A: UAsU, **** , UAm=8 and we have 
P(A1 UAg U-+'+*» UAm)=P(S)=1, implying that ° 

P(Ax) + P(Ag) +°77+P(Am) = 1. 


Cor. 3. ‘Tho event A and its complement A are mutually exclu- 
sive and hence, P(AU A)=P(A)+ P(A). Since AUA=8, it follows that 
P(AU A)=P(8)=1. Therefore, P(A)=1- P(A). 

Cor. 4. If A and B are two events then the events (AUB) and 
(ANB) are mutually exclusive and also, A=(AN B)U(ANB) and hence, 
P(A)=P(AMNB)+P(ANB), implying P(ANB)=P(A)-P(ANB). Again, 
(ANB)=A-(ANB). So, P{A-(ANB)}=P(A)-P(ANB). 

Cor. 5. By De Morgan’s Taw, we have (AUB)=(AMB) and 
hence, P(AU B)=P(ANB)=1—P(ANB), by Cor. 3. 

P(AUB)=1=P(ANB). 

Cor. 6. i i é event, then =0. For, if A be 
to or ae ier ber EASED) EA) ERA 
since A and ¢ are mutually exclusive, implying P(¢)=0. 
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THEOREM 2: 


Generalised Theorem on Total Probability—Ift A and B are any 
two events not necessarily mutually exclusive, in the sample space § 
then, 

P(AUB)=P(A) + P(B) - P(AN B). 


‘Proor : 


Let n(8) be the total number of sample points in a finite sample 
space 8 of a random experiment and n(AUB) the number of sample 
points in (AUB). Hence, by the definition of probability, 


n(AU B) 
P(A UB) ay ~-n(8) aay 
Now, »(AUB)=n(A)+n(B)—n(AMB) [see page 402] 
n(A) + n(B)—n(A MB) 
AUB) Ee ee ae 
aA), n(B) n(A MB) 
nS)" n(8) nl 8) 
= P(A)+ P(B)- P(A NB). 
The result can also be written as, 


P(A or B)=P(A)+ P(B)— P(A and B). 


ALTERNATIVE PRooF : : 
Let A and B be any two events, in the sample space § of a 
random experiment, not necessarily mutually exclusive. Then 
A-(ANB), (ANB) and B-(A NB) are three mutually exclusive events. 
Now, A={A—(ANMB)}U(ANB) 
B={B-(A MB)}U(ANB) 
- and (A UB)={A-(ANB)U(ANB)U {B-(ANB)} 
Hence, P(A) = P{A —(ANB)}+P(ANB); toe 
since A—(AMB) and ANB are mutually exclusive. 
P(B) = P{B-(ANB)}+ P(ANB); se (9) 
since, B—(AMB) and AM B are mutually exclusive. 
and P(AUB)=P{A—(ANB)}+P(ANB)+ PB (ANB) = (8) 
since, A—(A MB), ANB and B=(A MB) are mutually exclusive. 
From (3) with the help of (1) and (2), we get 
P(AUB)=P(A)— P(AMB)+P(A NB) +P(B)—P(AN B) [by Cor. 4.] 
= P(A)+P(B)= P(ANB) 


(1) 
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THEOREM 3: 
Tt A, B and © are any three events; then 
P(AU BU O)= P(A) + P(B) + P(C) - P(A NB) - P(ANC)—- P(BNC) 
+ P(ANBNO). 


PROOF : 
If A, B and O are any three events, then 
P(AU BUO)=P[(A UB) UC) 
=P(AUB)+ P(O)- P[(AUB) NC] 
=P(A)+ P(B)— P(ANB)+ P(O)- PANO) U(BNO)) 
Since, by the distributive property of the sets, 
(AUB)NC=(ANO)U(BNO). 
Now? Ph eats P(ANO)+ P(BNC) 
- P[ANO)A(BNO)] 
=P(ANG)+P(BNC)-P(ANBNO), 
Since (ANC)N(BNC)=(ANBNO). 
Heady P(AUBUO)= = P(A) +P(B) + P(0)- P(ANB)= P(ANO) 
= P(BNO)+P(ANBNO), 
The above thorem can be extended to the case of more than three 
events. 
For instance, if je As, sy Ag be any ” events, then it canbe’ 
shown by induction that, 


n 
P(AsUAgUs:Uy)= 3 PUA) — 3 PAU AY) F- 
i<j 


oot (—1)"77 P(AaMAsN*-An) = (1) 


Note 1. When # events Ay; Asyt*****s A, are mutually exclusive, then from (1), 
; P(A,UA,Ur eeeee! UA,) =P(A,)+P(Aa)+ seseee, +P(A,). f 
Since all other terms of B.H.S. (1) becomes zero, 


Note 2, P(AUB)=P(A)+P(B)—P(ANB) 
=> P(AUB) < P(A)+P(B), since PIANB) > 0. 
The sign of equality, holds only when A and Bare mutually exclusive, This 
inequality is known as Boolé’s inequality, 
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EXAMPLE : 
Find the probability of throwing a total of 8 in tossing two 
balanced dice. 


SoLnvrion : 
The sample space here consists of 36 Sample points. 
Let A denote the event of throwing a total of 8, then A consists 
of 5 sample points (2, 6), (3, 5), (4, 4), (5, 3), (6, 2). 
P(A) = as. 


EXAMPLE : 


A card is drawn from a well-shufiled pack of 52 cards. What is 
the probability of the card being either black or an ace ? 


Sonurion : ‘pte 
The sample space with the experiment of drawing a card from a 
well-shuified pack of 52 cards consists of 52 sample points, 


Let A denotes the event that the card is black, then A consists of 
26 sample points since there are 26 black cards, It Bdenotes the 
event that the card is:an ace, then B consists of 4 sample points; since 
the number of ace cards is 4. Now, (A OB) consists of two sample 
points, viz. two black aces. 

P(A) = 4§, P(B)= ys and P(ANB)=-, . 

-. the probability that the card drawn is either black or an ace 
is P(A or B)=P(AUB). Since the events A and B are nob mutually 
exclusive, we have, ae 


P(AUB)=P(A)+P(B)—P(ANB) 
“Htw-& 


=8i-ys. 
EXAMPLE : 


An urn contsins 19 balls numbering from 1013. Find the 
probability that a ball selected at random is a ball with number that 
is & multiple of 3 or 4. 


Sonurion : 


Here, the sample space consists of 18 sample Points. If A be the 
event that the ball selected is a ball with number that is a multiple of 
8, then A consists of 4 sample Points, viz. 8, 6, 9, 12, eipen? 
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P(A)= #5. 


Similarly, if B be the event that the ball selected is a ball with 
sumber that is a multiple of 4, then B consists of 3 sample. points, 
viz. 4,8 12. 


P(B)=3s. 
The event AMB comprises of only one sample point, i.e, 12 
which is a multiple of both 3 and 4. 
P(ANB)=zs; 
P(AUB)=P (A)+P (B)-P(ANMB) 
edstgs—dsoas which is the proba- 
bility that the ball selected at random is a multiple of 8 or 4. _ 


HXXAMPLE : 


In drawing a card from a well-shufiled pack, find the probability 
that the card drawn will be either a spade or a heart. ; 


SOLUTION : i 
The experiment of drawing ® card from a well-shuffled pack 
consists of 52 equally likely outcomes. So the sample space consists 
of 52 sample points. . 
Let A={the card is a spade}, 
and B={the card is a heart}. 
A consists of 18 sample points, since there are 13 spade cards, 
o.  P(A)=38. 
Similarly, B also consists of 18 sample points. - 
.. P(B)=4. 
The events A and B are mutually exclusive, since a spade and a 
heart cannot both occure in the same draw. 
°< 55 p (AUB)=P (A) +P (B)= 48+ 48 <3. 


EXAMPLE: ~ 

The probability that a contractor will get a plumbing contract 
is #, and the probability that he will not get an electric contract 
is §. If the probability of getting at least one contract is $, what 
is the probability that*he will get both the contracts ? (0. A. 1979) 


SOLUTION ;. L i : 
Let A be the event that the contractor will get » plumbing 
contract, then P (A) =#. ; ‘ nod ; 
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If B be the event that the contractor will get an electric contract, 
then P (B)=$ (given). 
P(B)=1-P(B)=1- $=. 
Also given, P (AUB)=4, 
by the addition rule, we have, 
P(AUB)=P (A)+P (B)- P(ANB). 
$-$+$-P (ANB), 
or, P(ANB)=$+4-4=#. 


Probability that the contractor will get both the 
contracts = +4. 


EXAMPLE : 


A and B are two events, not mutually exclusive, connected with 
a random experiment EB. If P(A)=+, P(B)=# and P(AUB)=}, 
find the values of the following probabilities : 


(i) P(AMB), (ii) P (ANB*), (iii) P (A° UB) 
where c stands for the complement, (GO, U. 1980) 


SoLUTION : 
(i) We have, 


P (AUB)=P(A)+P(B)-P (ANB), since A and B are not 
mutually exclusive. 


ie, 4=44+$-P (ANB) 
or P(ANB)=4+3-4= 3%. 
(ii) B (ANB) =P (A)-P(ANB)=4- f= ye=e0 [by Cor. (4)] 
(iii) P (A°UB*) =P {(ANB)}=1-P (ANB)=1-yo=3$ 
[by Gor. (5)] 
Compound Probability. 


~»; The probability of occurrence of two or more’events simul- 
taneously is termed as compound probability. The usual notation for 
compound probability for two events Ai, Asis P(A, MA.) and for 
events Ay, Ags .'*** , An is P(ALNA,N-: “MV Ag). 


Note. Compound probability is also called Joint Probability. 
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Conditional Probability. 


Let A and B be two events in the sample space S of a random 
experiment, such that P(A) >0. The probability of occurrence of 
event B, subject to the condition that event A has already occurred, is 
called the conditional probability of B, given A. In terms of symbols, 
it is written P (B| A) and is defined by the proportion of Sample points 
in event B among the sample points in event A. 


The vertical bar is read ‘given’. 


Il. Rule of Mattiplication : 
THEOREM 1: 


The probability of simultaneons occurrence of two events is 
equal to the product of the probability of one of the events by the 
conditional probability of the other, giyen that the first one has 
already occurred. 


PROOF : 


Let S be a finite sample space of equally likely outcomes of a 
random experiment and m(S) be the total number of sample points 
in 8. Let A be any event in S with sample points.» (A), such that 
n(A) > 0. Together with the sample points n (A), let us consider, the 
number of sample points » (AMB) that are simultaneously in A and B. 
Then the ratio 


n (ANB) 
n (A) 
is the proportion of sample points in B among the sample points in A 
and is the conditional probability of B, given A. ~ 


P(Bla) a aoe) 


_n (ANB), n (8) 
(8) © (A) 


n (AMB) /n(A) 
n(8) / (8) 


=P (ANB)/P (A). ve (1) 
In the similar way we can derive, 
P (A|B)=P (ANB)/P (B), if m (B) > 0, ze, P(B)>O + (2) 
From the equations (1) and (2), we get, 
P(ANB)=P (A) P(BJA)=P (B) P (AIB) se (3) 
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Note 1. In computing P(B|A) we are essentially computing the probability 
of B with reference to the subset A of the original sample space S, rather than 
with reference to the original sample space 8. This subset A of 8 is called the 
reduced sample space. Thus in the conditional probability of B, given A, the 
reduced sample space is A, the given event. But in computing P(B) we compute 
the probability of B with reference to the original sample space 8. P(B) is 
really an abbreviation for P(B|S). But S is dropped as understood, 

Note 2. P(B/A) may be computed either directly by calculating the probabi- 
lity of B with reference to the reduced sample space A or by using 

P (BJA)=P (AMB)|P (A) 
where P(AMB) and P(A) are computed with reference to the ‘origina] sample 
space S, 

Remark. In finite sample space of equally likely outcomes, P (A) should be 
not equal to zero to define P (B/A), and P (B) should not be equal to zero to define 
P (A/B). 


THEOREM 2: 
Extention of Theorem of Compound Probability—The probability 
of simultaneons occurrence of m events Ai, Ae, ******; An is 
P (As NAgM-::MAng)=P (Aa). P (As| Aa). P (As|As 9 Aa)*****: 
sseP (An | Aa MAg M+? An-a) 
provided P (Ax) > 0, P(As|A1) > 0, P(As|AirM Ae) > 0, +++" 
sheahy PB (An-2/A1 NAgN*+-N Ana) > 0 


PROOF : 


Let m (8) be the total number of sample points in a finite sample 
‘space § of. a random experiment of which there are n(A1) sample 
points in A, and » (A: Ag) sample points in (Ai M Ag). 


“. P(A,NA,)=2 442" 48) (Aa Aa Pie) 


(1) 
Since P (Ai) > 0, i.e., m (Ax) >'0, equation (1) can be written 
in the form : 


ya (AsMAs) n (Aa), 
Pe a n(8) 


n (Ai MAs) n (hs) 


But in (Aq) oF (Aal As) and 


=P(A,). 
Hence, P(A, MAs)=P (Ai) stm 
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Again, since P(Ai) > 0 and P(As|A,) > 0, we have from (2), 
P(A, MAg) > 0, and hence, 
P(Ai NAg NAs)=P{(Ay NA,)NAS} 
= P(A M Ae) P(As/AiNAg) 
= P(A).P(As |A3).P(As |Ax MAg) 
Proceeding in this way we get, 
P(A Ao M +++ An) = P(A3).P(Ao|Ai):"P(An| Ar 0 AaM + NAn-1) 


Cor. 1. If the events Ax, Ag,...... , An are mutually exclusive 
and exhaustive, then BNAi, BNAg, ...... , BNA, are mutually 
exclusive and B=(BNM A,)U(BNAg)U----- U(BNA,). 


P(B) = P{(BN Ai) U(BNAg)U+++ UBN An) 
=P(BNA,)+ P(BN Ag) +-** + P(BN An) 
= P(Ax)P(B] Ax) + P(As)P(B| Aa) + ++-+++ + P(An)P(B/An), 
provided P(A,) > 0, P(As) > 0, ...... , P(An) > 0. 


Cor. 2. If A and B be two events, then A=(ANB)U(ANB) and 
ANB and ANB are mutually exclusive. 


Hence. P(A)=P(ANB)+P(ANB). 


Independent Events. y 


Two events are said to be independent if the occurrence of one 
event does not influence the occurrence of the other event. 


TEXXAMPLE : 

Successive tosses of a fair coin are independent. If fair coin is 
tossed twice, the event ‘Head’ in the first toss and the event ‘Head’ in 
the second toss are independent since the occurrence of ‘Head’ in any 
toss does not influence the occurrence of ‘Head’ of the other toss and 
the probability of getting a Head, say, in the second toss, which is +, 
does not change, if it be known that the first toss has resulted ina 
‘Head’ or not. 

Similarly, if two cards are drawn with replacement from a pack of 
well-shuffled cards, the events (A) ‘black card in the first draw’ and 
(B) ‘black card in the second draw’ are independent. But if the drawing 
is made without replacement, the events A and B will be dependent 
events. 

Two or more events are considered to be independent if the occurrence or non- 
occurrence of one has no effect on whether the other or others occur, 


If events A and B are such that 
P(A/B) = P(A) 


Bus. Stat.—28 
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then the occurrence of event B does not alter the probability of 
event A and hence we say that the event A is independent of event B. 


Tf event A is independent of event B, then the event B is also 
independent of event A. 


For, we have 
P(ANB)=P(A)P(B] A) = P(B) P(AIB) 
Now, since event A is independent of event B, 
P(A|B)=P(A). 
P(A)P(B| A) = P(B)P(A) 
= P(BIA)=P(B), since P(A) > 0 
that is, the event B is independent of event A. 


For independent events A and B the compound probability 
theorem takes the following simple form 


P(ANB)=P(A)P(B). 
This relation is also used to define the independence of the two 
events A and B. ; 


Definition: Two events A and B are called independent when 
the relation P(AMB)=P(A)P(B) holds otherwise they are called dep- 
endent events. 


Remark: The relation P(A(B)=P(A)P(B) is accepted to remain valid for all 
values of P(A) and P(B) including P(A)=0 and P(B)=0. 

In general, when a finite number of events yy Figs. cnoces » An are 
independent, we have 

PALM AaNsss* OAn)=P(Ax)P(Aa)--+-+*P(An) 

Note. If two events A and B are mutually exclusive then AMB=¢ and henoe 
P(ANB)=P(¢)=0. 
THEOREM : 


; Tf A and B are two independent events, then A and B are also 
independent. 


Proor: We have 
P(ANB)=P(AUB), by De Morgan's Law 
=1-P(AUB) 
=1-{P(A)+P(B)-P(ANB)} 
=1-P(A)-P(B) + P(AN B) 
=1-P(A)-P(B)+ P(A) P(B), 


since A and B are independent. 
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=1-P(A)—P(B){1— P(A} 
={1-P(A)}{1-P(B} 
= P(A) P(B), 


that is, the events A and B are independent. 


THEOREM : 


If A and B are two independent events, then A and B are also 
independent. 


PROOF : 
Since (AN B)=A—(ANMB), we have, 
P(AN B)=P{A—(ANB)}} 
=P(A)-—P(ANB) [by Cor. 4—see page 425 ] 
=P(A)—P(A).P(B), since A and B are independent 


events 
=P(A) {1-P(B}} 
=P(A). P(B), 


that is, the events A and B are independents. 


ALTERNATIVE PROOF : 
We have, indeed, 
P(B|A)+P(BJA)=1 
=> P(B)+ P(B|A)=1, since A and B are independent 
= P(B|A)=1-P(B)=P(B), 
that is, the events A and B are independent. 
THEOREM : 
Tf A and B are two independent events, then events A and B are 
also independent. 
PROOF : 
Since (AN B)=B-(ANB), we have, 
P(ANB)=P{B-(ANB) 
=P(B)— P(ANB) [ by Cor. 4—see page 425 J 
=P(B)- P(A).P(B), since A and B are independent 


=P(By1— P(A)} events 


=P(B) P(A), 
that is, the events A and B are independent. 
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ALTERNATIVE PROOF : 
We have, indeed, 
P(A|B)+P(A|B)=1 
=> P(A)+P(A|B)=1, since A and B are independent 
=> P(A|B)=1-P(A)=P(A), 
that is, the events A, B are independent. 


THEOREM : 
If A and B are two independent events, then 
P(A UB)=1-P(A).P(B). 


PROOF : 
Since (AUB) is the complement of (AUB), we have 
P(A UB)=1-—P(AUB) 
=1-P(ANB) [by De Morgan’s Law] 
=1-P(A).P(B) 
[Since A and B are independent, A and B are also independent. J 


Note. If there are m independent events A,, A,, --+ A, then 
P(A, MAa MM An)=1 P(A) P(Aa):“PlAn)e 


THEOREM : 
If A and B are two independent eyents such that P(A) > 0 and 
P(B) > 0, then (ANB) # 4, ive. A and Bare not mutually exclusive. 
PROOF : 
Tf possible, let (AMB)=4, where ¢ is an impossible event. Then 
P(ANB)=P(¢)=0 
Since, A and B are independent events, 
“P(ANB)=P(A).P(B) 


Therefore, P(A).P(B)=0 implying that either P(A)=0 or P(B)=0 
which contradicts the hypothesis of the theorem that P(A)#0 and 
P(B)A0. Hence (ANB)#¢, ie. A and Bare not mutually exclusive. 


Thus, two events with non-zero probabilities cannot be mutually 
exclusive and independent simultaneously. 


Two events A and B can be mutually exclusive and independent 
simultaneously if either P(A)=0 or P(B)=0. 


“ 
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THEOREM : 


If two events A and B, having non-zero probabilities, are 
mutually exclusive then both P(A|B) and P(B|A) are equal to zero. 


PROOF : 


Since events A and B are mutually exclusive, we have, 
P(ANB)=P(¢)=0. 


Now, P(A|B)= a ae} =0, since P(B) > 0 


Similarly, P(B]A)= PAOe) i ay =0, sinee P(A) > 0. 


i> Shp 


Baye's Thorem. 

Let Ai, Ag, .++A, be m mutually exclusive events whose union is the 
sample space § in a random experiment and let B be an arbitrary event in the 
sample space such that P(B)+0, then, 


P(A,).P(B| Ai) 


P(AB)= — 
: 2 P(A;).P(B] Aj) 
iz 
PROOF : 
By the law of Conditional Probability, 
P(A:MB) 
P(BIA)= P(A) 
or P(A;MB)= P(A,).P(BI Ay) 


Similarly, P(BOA,)=P(B).P(A:|B) 
Now, since P(A; B)=P(BNA,), it follows that 
x P(A;) P(BIAs)= P(B)-P(As|B) 
r : P(A)-POBIA). 
oi P@) 


P(A: |B) = 
Since the events A,, As, ****"* , An are mutually exclusive and exhaustive, and i 
P(B) 40, 
P(B)=P(A, NB)+P(AyNB)+-+*+P(AnMB) 
=P(A,) P(BJA,)+P(A2) P(B] A.) +--+ P(An).P(B| An) 


EXAMPLE: 


i An urn contains 7 black and 5 white balls. Two balls are drawn 
at random one after the other. Find the probability that both balls 
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drawn are black if (i) when first ball drawn is not replaced before 
drawing the second and (ii) when first ball drawn is replaced before 
drawing the second ball. 


SoLUTION : 


(i) The sample space consists of 12 sample points as there are 
altogether 7 +5=12 balls. 


Let A={1st ball drawn is black}, 
and B= {2nd ball drawn is black}. 
The event A consists of 7 sample points as there are 7 black balls. 
P(A) =35. 


Now since the first ball drawn is black and is not replaced the 
sample space reduces to 11 points only as there are only 6 black and 
5 white balls left. The event B now consists of 6 sample points a8 
there are now 6 black balls. 


ae P(B/A)=x4. 
the probability that both balls drawn black is, 
P(ANB)=P(A).P(B| A) 
=a Xia. 

(ii) Now the events A and Blare independent, 
P(A)=s2 ; P(B)=s5; 

and P(AMB)=P(A).P(B)= 5% vr= sek. 

EXAMPLE : 


A bag contains 7 red and 5 white balls. 2 balls are drawn at 
random without replacement. What is the probability that the second 
ball is red, knowing that the first ball is red ? 


SoLuTIoN : 


Let A={ist ball is red}; B={2nd ball is red}. 
Total number of balls in the bag=7+5=19. 


There are *"Ca ways of drawing 2 balls from the bag and 
hence the sample space consists of 10, sample points. 


ae number of ways of drawing 2 red balls from 7 red balls is 
2° 0, 


7 
PANB)= nag i ay 
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P(A)= probability that the 1st ball drawn is red =z. 
Now, P(ANB)=P(A).P(B| A) 
dg =7'-P(B| A) 
=> P(BIA)=aeX Har. 

Note. (BA) can be directly computed. For"when the'lst ball drawn is*red, 
there remain 6 red and 5 white balls in the bag and hence P(B]A)= ene 
EXAMPLE : 

In throwing two fair-dice, find the probability of’ getting a total 


ef 9, when it is known that second dice will show « smaller value than 
the first dice. 


SoLvrION : 
The sample space consists of 6* =36 sample points. 


Let A={Sum of the points in two die=9}, 
B= {Points in the 1st dice is greater than the points in the 
9nd dice in any throw}. 
Thus A={(6, 3), (5, 4), (4, 5), (8, 6}; 
B={(2, 1), (83, 1), (4, 1), (6, 1), (6, 1), (8, 2), (4, 2), (5, 2), 
(6, 2), (4, 8), (5, 3), (6, 3), (, 4), (6, 4), (6, 5)}. 
So, event A consists of four and event B consists of fifteen sample 
points. 
Again (BMA) consists of only two sample points. viz. (6, 3) and 
P(BNA)=se=7s 
Also, P(B)=3¢ 
Now, P(BNA)=P(B).P(AIB). 
*, 0) fs =34.P(A|B) 
=> A(AIB)=xs Xib=2 


EXAMPLE : 


Two persons X and Y appear in an interview for ‘two vacancies 
in the same post. The probability of X’s selection is'% and that of Y's 
selection is $, what is the probability that 
(i) both X and Y will be(selected, 
(ii) only one of them will be selected, 
and (iii) none of them will be selected. 
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SOLUTION : 
Let A={X will be selected}, 
and B={Y will be selected}. 
(i) In this case the events A and B are independent. 
P(AN B)=P(A).P(B)=4x 4= 7, 
Probability that both X and Y will be selected = ve. 


(ii) The event A and B are independent and go also the events 
and B, since events A and B are independent. . 


-. P(only one of them will be selected) 
=P(ANB)+P(ANB) 
= P(A).P(B) + P(A).P(B) 
= P(A).{1 — P(B)} + {1 - P(A)}.P(B) 
=3.1-4)+ (1-44 
=h4444 
=a55%. 
(iii) The events A and B are independent, since the events A and 
B are independent. 
P(none of X and Y will be selected) 
=P(ANB) 
=P(A).P(B) 
=(1-P(A)].{1-P(B)] 
=(1-$).-4) 


=g4=4%. 


EXAMPLE : 


Three bags contain repectively 5 white, 3 black balls ; 7 white, 
8 black balls; 4 white, 5 black balls. One bag is chosen at random 


and a ball from it is also chosen at random. What is the probability 
that the ball is white ? 


SOLUTION : 
Let A={ball drawn is white}, 
and B;={i-th bag is chosen}; i=1, 2,3. 


The events B,, Ba, Bg are mutually exclusive exhaustive and 
also none of them is impossible, 
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Then P(B,)=3, oe =$ and P(Bs)=4 


and P(A|B,)=2—= P(A|Bs)=7% and P(A|B;)=4. 


eo 
P(A) = P(Bi). eke sl ewes). P(A|Be)+P(Bs). P(AIBs) 
=$. fst. gett. $ 
=4Urst+25+4) 
=rosp. 
XAMPLE : 


There are two men aged 30 and 36 years. The probability to 
live 35 years more is ‘67 for the 30 years old and ‘60 for the 36 years 
old person. Find the probability that at least one of these persons will 
be alive 35 years hence. 


SOLUTION : 


Let A be the event that 30 years old person will die within 35 
years and B be the event that 36 years old person will die within 
35 yearss. 

P(A)=1-"67="33 
P(B)=1-—"60="40. 

Since the events A and B are independent, the probability that 

both persons will die within 35 years is given by : 
P(ANB)=P(A).P(B) ="33 x 40 = "182 

.'. the probability that at least one of the persons will be alive 
35 years hence is, 

1- P(ANB)=1-07132 =0'968. 


EXAMPLE : ; 

Urn-1 contain 5 red and 5 black balls, urn-2 contains 4 red and 
8 black balls and urn-3 contains 3 red and 6 black balls. One urn is 
chosen at random and a ballis drawn. The colour of the ball is black. 
What is the probability that it has been drawn from urn-3. ? 
Soxution : 

Let A ={A black ball is drawn} : 

and B;={i-th urn is chosen} ; 1=1, 2,3 

Then, P(B,)=4, P(Bs)=4 and P(Bs)=4 ; 

P(AIBL)=25 3 P(A| Be) = and P(A|Bs)= 
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P(urn-3 is chosen/ball drawn is black) 


=P(Bs/A) / 
ty P(Bs). P(A |Bs) | 
P(B,). P(A|B)+P(Bz). P(A|Bs)+ P(Bs). P(AIBs) | 


EXAMPLE : 
A fair coin is tossed twice, show that the events, 
A= {Head on'the first coin}, 
B= {Head on the second coin}, 
C= {Head on one coin only}, 
are pairwise independent but not independent themselves. 


Sonurion : 


Here, the sample space contains four sample points, viz. HH, 
HT, TH and TT, the event A contains 2 sample points, HH and BT ; 
B [ae two points, HH and TH and C also contains two points, 
TH. 


Therefore, we have, 
P(A)=2=4, P(B)=2=4, P(O)=2=}. 
P(A).P(B)=4, P(B).P(C)=4 and P(C).P(A)=4 


Also, ANB contains one sample point, HH, ANC contains one 
point, HT, and BNC contains one point TH. 


P(ANB)=4, P(ANG)=4 and P(BNG)=4 
implying P(ANB)=P(A),P(B), P(ANC)=P(A).P(C) | 
and P(BNC)=P(B).P(C). | 
that is, events A, B and C are pairwise independent. 
But P(ANBNC)=(Ps)=0 and P(A).P(B).P(C)=4 
P(ANBNC)# P(A).P(B).P(O), 
that is, the events A, B, C are not independent. 


Repeated Trials, 


Let ® random experiment has two outcomes—the occurrence of 
an eyent A, called a success or its non-oceurrence a failure. Tt the 
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probability of success in any trial is p and the probability of failure 
in any trial is g=1—p, then the probability of 7 successes in ” 
independent trials is given by 2 

Palr)="Cr pg". 

For, the probability that the first 7 trials produce successes at. 
each trial and the remaining (n—7) trials produce only failures at. 
each trial by the theorem of Compound Probability is 

DD Daegu q=p"a 
since there are 7 factor p and (n—7) factors q. 

Similarly, the probability for any fixed sequence of 7 successes. 
and (n—7r) failures is p’g@"-’. The number of such sequences is eGns 
for we must choose exactly 7 positions out of m for the successes and 
the remaining (n—1) positions for the failures. Since these "C, out- 
comes are mutually exclusive, the probability of r successes in ” 
independent trials, by the theorem of addition of probabilities, is 
"C,p"q"”. 

Note. The observations of a random experiment are often referred to as érials.. 


EXAMPLE : 


. Find the probability of getting 4 heads and 3 tails in tossing, 
coins. 


SOLUTION : 

Assuming that the coins are fair, tossing of 7 coins is the same as 
tossing a coin 7 times. 

Let the event ‘appearance of a head’ be called a success. 


Now, Probability of getting a head =4. 
the probability of success in each trial =p=+. 
q=1-p-1-3-4. 
Here the trials are independent, since p or q is not affected by 
the results of any other tossing. 
Therefore the probability that there will be 4 heads (and hence 


3 tails) in 7 trials is, 


7.65.41 _ 85 
"Cx (8).4)° = f3.9.197 198 


EXAMPLE : 


A factory produces articles among which 20% are defective. If 
5 articles are selected at random from a day’s production, find the 
probability that there will be exactly 2 defectives. 
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SOLUTION ; 
Let the event ‘occurrence’ of a defective article be called a E 
success. Then, 4 
p=probability that an article is defective = ~4% ='20 
q=1-'20='80. 
Here, the trials are independent, since the probability of 


occurrence of defective article in any trial is not affected by. the — 
occurrence or non-occurrence of a defective article in any other trial. 


Therefore, the probability that there will be exactly 2 defectives 
= *Ca('2)*('8)* =0'2048. 
EXAMPLE : ; 
Find the probability of having 3 boys in a family with 5 children. 


“SOLUTION : 


_ Let the event of having a boy be called a success. Then, 
p=probability of having a boy =. 
q=1-3=}. 


Here, the trials are independent since the probability of having a q 


‘boy or a girl is not affected by the result of any other trial. 1 
Therefore, the probability that there will be 3 boys (and hence — 
2 girls) in a family of 5 children is 
*05(4)°(4)>-* 
= sxaese.t 


=15. 
Random Variable 


A variable whose value is a numerical quantity determined by 
the outcome of 4 random experiment is called a random variable. 


“EXAMPLE : 


__The experiment of tossing a number of coins, say 3, has eight 
possible outcomes HHH, HHT, THH, HTH, HTT, THT, TTH, TTT 
but these outcomes are not numerical. We may, however, associate 
thenumbers 0, 1, 2, 3 corresponding to the four possibilities regarding 
the number of heads that appear in 3 coins. If we now let the variable 
@ represent the number of heads observed in the experiment, then the 
“Possible values that 2 can have is 0,1, 2,3. Since the value of the 
variable @ is a number determined by the outcome of an experiment, 
it is a random variable. 
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Random variables are of two types : (i) continuous and (ii) discrete. 


Discrete Random Variable : 
When a random variable can assume only the values that can be 
counted, is called a discrete random variable. 


Examples of discrete random variables are the number of 
defective items in a lot, number of accidents in a month, etc. 


Continuous Random Variable : 


A random variable is continuous if it can assume any value 
within a given range. It has infinite possible values. 

Examples of continuous random variables are average cost per 
unit of an item, weight of students in a class, length of a telephone: 
conversation, etc. 

Obviously to each value of the random variable there corresponds. 
a definite probability and using this a probability distribution is defined 
as follows : 


DEFINITION : 

Probability distribution is a systematic arrangement of the 
Possible values of a random yariable and their corresponding 
probabilities. 

For example, in tossing a balanced die once, if the variable x 


represents the number of spots that appears uppermost, it takes on the 
values 1, 2, ...... , 6 with corresponding probabilities 3 for each. Hence, 


Probability distribution table associated with tossing a balanced die. 
No. of spots (x) 1 Or MBIT Owe ee bE 
Probability ON eed ami as 
Also the following is the probability distribution table associated 
with the tossing of a fair coin thrice. 
Probability distribution table associated with tossing a fair coin 


thrice. ‘ 
No. of Heads (a) 0 1 2 3 
Probability Bees rs eg 
In this example, the probability of getting no head is 
*Oo(4)°(4)° = 


the probability of getting one head is, °C,(4)*(4)°-* =8 
the probability of getting two heads is, “C2(4)*($)°-* =8 
and thé probability of getting three heads is, °C,(4)5(3)?-° = 1, 
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Mathematical Expectation : 


Let 2 be a random variable with possible values #4, 22, °**, 2n 
with corresponding probabilities p1, 92, -**, Dn then the expected value 
of the random variable or the mathematical expectation of the random 
variable H(a) is defined as 

PiTrt+ Paka t t+ Pntn 

Thus, H(z)=pitit+pote+-*+Pntn. 


The expected value E(x) is also called the mean of a and is 
denoted by @ or m. 


Some Properties of Expected Values: 
(1) The expected value of a constant is the constant itself, 
4.6., E(a)=a. 
(2) H(a+ba) =a+b E(a), where a, b are constants, 
for, H(a + br)=p,(a + bz.) +pa(at baa) +--+ pala + ben) 
=a(pit+Dat-++n)+ Wesp1+ tape +++ + 2nPn) 
=a+b Hc) since, pi +: +pn=1. 
(8) E(a+ bz)? =E(a* + 2ba + bx") 
=p1i(a* + Qbe + b7@1*) + pala* + Abare + B74") +" 
s+ pala? + Qban + B77) 
=a"(pit pa +++ pn) + W(pies + Date +++ + Dnstn) 
+0%(ps04" + pata +++ + Dnitn) 
=a? + 2b K(z) +b? H(2?). 
(4) Ifa and y be two random variables, then 
E(@ +y)=E(c) + Hy). 
(5) If and y be two independent random variables, then 
(ey) = E(e).H(y). 


Variance of a Random Variable : 


Let @ be a random yariable with mean E(z)=m. ‘Then the 
variance of x is defined as 


Var (22) = El(a — H(2))*] = Ela — m)*] 
= (21 —m)*p1 + (ea — m)*pg +--+ (cn —2m)* pn 
=(c1* —2aym+ m*)py 
+(@9* -—Qcamt m*)pa t+: +(a_* —Qenm +m") Pn 
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=(e1*pi+ 2a"pote +27 pn) 

—Qm(e1p1 + Lape +--+ + anDn) +m*(p. t+ Det + pn) 
P =H(x*)— 2m E(a)+m? ; since prtpot-+pn=1 

=E(x*) — 2E(e).(2) + [E(a)]* 

=E(e*)—[E(2)]* 

=E(x*)—m?. 
Thus, Var (2) = E(e*) —[H(x)]? =E(e*)—m?. 

8.D.(2)= J/H(e)—m?. 


EXAMPLE : 


If @ coin is tossed 50 times, in how many of these tosses it is 
expected to find the ‘head’ ? 


Sonvrion : 
In a single toss, the probability of obtaining head is 4. 


expected number of tosses with ‘head’ out of 50 tosses 
=60x4=25, 


EXAMPLE : 


The following is the probability distribution of a random 
variable : ( 


© 2 3 4 5 
Probability 02 04 0°3 01 


Find the expected value and variance of the random variable. 


SoLvTIoN : 

We have, 

H(a)=03p1+ pat 2505 + oss 
=2x094+3x044+4x0345x0'1 
04419412405 

 8'8, 
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Var (2) = Bl{a— E(a)}?] 

= {ea — E(a)}*p3 + {2 —B(e)}* pot {a5 — Hle)}*p, 
+{x,—H(z)}*p, 

=(2—8°3)* x 0'2+(3—3°3)? x0°'4+ (4-373)? x 03 
+(5-3'8)? xO'L 

= 1°69 x 0°2+0°9 x 0°4+°49 x 0°38 +9'89 x01 

=0'338 + 0°36 + 0°147 + 0°289 

=1134 

(Note. S. D, of the random variable= A Var(a).] 


EXAMPLE : 


Find the mathemetical expectation of the number of points if 
a balanced die is rolled. 


Sonvurion : 
Here the random variable takes on the values 1, 2, 3, 4, 5, 6 with 
corresponding probabilities 4 for each. 
Hence, 
H(e)=1.3+9.44344434 54464 
= (1+24+34+4+5+6) 
2 3 ROS 
mh lar m3 


EXAMPLE : 


_... Find the mathemetical expectation of receiving a tail when 
8 balanced coin is tossed twice. 


Sonution : 


Outcomes are (H, H) ; (H, 7) ; (T, H); (7, 1). So, there is 1 way 
of getting two tails, 2 ways of getting one tail and 1 way of getting no 
tail, . 

Now, P (No tail)="5 x5 ="95 r 
*P (One tail)="5 x 5+ 5x “5 ='50 ; 
P (Two tails) ="5 x *5="95, 

If the variable « represents the number of tails it takes the 
values 0, 1 and 2 with corresponding probabilities ‘25, °5 and ‘25. 
Hence, 

E(w) =0* '254+1x'5+9x-95 


* The probability of getting one tail is the sum of the probabilities of HT, TH, 
and similarly the other probabilities, 
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EXAMPLE : 
Find the expected value and the variance of number of points 
in rolling two balanced dice. 
SOLUTION : 
Here the random variable takes on the values 2, 3, 4, .... 12 with 
corresponding probabilities #s, yo, ve, vs, ve, ve, Be, Ve, ve, To, ve 
H(z) =2x% get 3xeet4xvot bX get 6xX ast 7X ve 
+8X ge +9X go t10X Set11LX et1Ix se = Be= 7. 
Variance : 
Var (#)[o*] = B [{e—E(a)}*] 
= (2-7)? x det (8-7)? X get (4-7)? X fet (5-7)? X of 
+ (6-17)? x $5 +(7— 7)? x 5 + (8-7)? X go + (9-7)? X ve 
+ (10-7)? x #5 +(11- 7)? X so + (12-7)? x do 
SE+8E +E toe t oot 38+ 85+ 88+ 38 


l! 
je oe 
= ole 
o 


EXAMPLE : 


A man is to play a game as follows : 

In three tosses of a balanced coin, he will get a reward of 
Rs. 20,000, Rs. 10,000, Rs. 1,000 and no reward if he gets three tails, 
two tails, one tail and no tail respectively. The entrance fee for the 
contest is Rs. 6,000. Will he play the game ? 


SoLurion : WLS 
He will like to play the game if he receives more than Rs. 6,000 
(the entrance fee). 
No. of tails Pay off(a) Probability (p) . Expected value (px) 


0 0 Bx 5X5 = "125 0 
1 5,000 *3('5 x 5x 6) = "375 1875 
2 10,000 8(5 x 5x 6) = "875 8750 
3 20,000 6x6 x5 = "125 2500 

st 2 8126 


i i i this expected return is 
His expected return is Rs. 8,125 and since 
more than Rs. 6,000, the entrance fee, he will play the game. 
“The probability of getting exactly one tail is the sum of the probabilities 
of HHT, HTHandTHH, and similarly the other probabilities, 
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Note. By expected return we mean if he play long enough then on an 
average he may get Rs. 8,125. 


Miscellaneous Examples 
EXAMPLE : 


A bag contains 8 red balls and 5 white balls. Two Successive 
draws of 3 balls are made without replacement. Hind the probability 
that the first drawing will give 8 white balls and the second 3 red 

Ls. (0. A. 1978 ) 


SoLurion : 


Total number of balls=8+5=13. 
3 balls can be drawn from 13 balls in *°0,=286 ways. 
So, the sample space consists of 286 sample points. 
Let, A={Three balls drawn in 1st drawing are white} 
and B={Three balls drawn in 2nd drawing are red}. 
The event A consists of °C, =10 sample points 
“. PAA)=H%. 
Now, since the 8 white balls drawn in the 1st drawing are not 
replaced, the sample space now reduces to *°0,= 120 sample points, 


a8 there are now only 8 red and 2 white balls left and 3 balls can be 
drawn from 10 balls in *°C, ways. 


The event B consists of °0,=56 sample points as 3 red balls 
can be drawn from 8 red balls in °C, ways. 


“. P(B/A)=35%. 
-'. by theorem of compound probability, 
P(ANB)= P(A).P(B/A) 
sas X oad. 


EXAMPLE : 


A bag contains 8 red balls and 6 white balls. Two successive 


drawings of 3 balls are made with replacement. Find the probability 
a ue first drawing will give 3 white balls and the second 
Te 5. 


Sonurion : 


Total number of balls=8+5= 13, 
Out of 18 balls, 3 balls can be drawn in **C, =986 ways, 
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So, the sample space consists of 286 sample points. 
Let A={Three balls drawn in first drawing are white} 
and B={Three balls drawn in second drawing are red}. 
Now, out of 5 white balls, 3 can be drawn in °C, =10 ways. 
-. the event A consists of 10 sample points and hence 
P(A) = vets. 
Since 3 white balls drawn in the first drawing are replaced 
the events A and B are independent and the event B consists of 
°C, =56 sample points. 
“. P(B)= aie: 
by the multiplication rule, 
P(ANB)=P(A),P(B)= ye % ahs = zoeeT 
EXAMPLE : 


An article manufactured by a company consists of two parts 
A and B, In the process of manufacture of part A, 9 out of 100 are 
likely to be defective. Similarly 5 out of 100 are likely to be defective 
in the manufacture of pars B. Oaleulate the probability that the 
assembled part will not be defective. 


Sonurion : 

Let A and B denote the events that part A and part B of the. 
article are defective respectively. Then, 

P(A) =r6y and P(B)=zé0. 2 

The probability that the part-A will not be defective is 

P(A) =1- P(A)=1-x$5=1%0- 

Similarly, the probability that the part-B will not be defective is 

P(B)=1- P(B)=1- 180-00: 
The two events A and B are independent and consequently the 


two complementary events A and B are independent. Hence by the 
multiplication rule of the probability, 


n ~,. oy 81 95 _ 8645 _. 
P(ANB)=P(A).P(B)= Se'so0 * 100007 08 


EXAMPLE : 
i iness Statistics 
The probability that X can solve a problem in Bus! : 
is 2, that Y can solve it is %, that Z can nates ihe fe “a bre Jew ee 


independently, find the probability that the pro 
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SonuTion : 
Let A, B and O denote the events that the problem in Business 
Statistics is solved by the students X, Y and Z respectively. Then 
P(A)=%, P(B)=2 and P(C)=5. 
The problem will be solved if either A or B or C solves it. 
Hence, by addition rule of probability, 
P(AUBUO)=P(A) + P(B) + P(C)- P(AMB)- P(BNC)- P(ONA) 
. +P(ANBNC) 
= P(A) + P(B) + P(C) - P(A).P(B) - P(B).P(O) 
— P(O).P(A) + P(A).P(B).P(C) 
(‘+ events A, B, O are independent) 
mitg+$-25-34-G2+235 


ALTERNATIVE METHOD : 
We have, P(AUBUC)=1- P(ANBNG) 
=1-P(A).P(B).P(G) (A,B, C are 
independent events, so also A, B, 0) 
=1-(1-P(A)) (1-P(B)) (1- PO) 
=1-(1-4 (1-4 (1-$) 
=1-2.5.6=383. 


EXAMPLE : 


A bag contains 5 red and 4 black balls. A ball is drawn at 
random from the bag and put into another bag which contains 3 red 
and 7 black balls. A ball is drawn randomly from the second bag. 
What is the probability that it is red ? (0, U. 1970 ) 


Sonurion : 


» Let A=event of transferring a red ball from first bag. 
B=event of transferring a black ball from first bag. 
C=event of drawing « red ball from second bag. 
To draw a red ball from the second bag we should have either 


(i) a red ball is transferred from the first bag to the second bag 
and a red ball is drawn from it, 


« or, (ii) a black ball is transferred from the first bag to the second bag 
and a red ball is drawn from it, quik ; 79° ‘ 
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That is, we Should haye either (A and ©) or (B and O), i.c., either 
(ANC) or (BNC). 


Now, since the events ANC and BNO are mutually exclusive, 
the required probability is 


P[(ANG) or (BNC)] =P[(ANGC)U(BNO)) 
=P(ANC)+ P(BNC) 
=P(A).P(C| A) + P(B).P(B|O). 
Now, P(A)= oo a P(B)=<45- ; 
P(O| A) = ¢k (since red ball is transferred from 1st bag to the 
2nd bag). 
P(C|B)=% (Since a black ball is tranferred from 1st bag to 
the 2nd bag). 
required Probability = $.r% + $x = 33. 


EXAMPLE : 


An urn contains 5 red and 4 black balls and another urn contains 
8 red and 7 black balls. If one ball is drawn from each urn, find the 
mrobabliies that (i) both are of same colour and (ii) both are of different 
colours. 


Sonurion ; 
Let R, ={Ball drawn from 1st urn is red}, 
Bi ={Ball drawn from ist urn is black}, 
Re ={Ball drawn from 2nd urn is red}, 
Be ={Ball drawn from 2nd urn is black}. 


1st part: When both the balls are of same colour. 

To obtain the balls of the same colour we should cen either 
both balls red or both balls black, i.e, either (Ri Rg) or (Bs NBs). ° 
Now, since the events RNR. and Bi NBs are mutually exclusive, we 
have by addition rule of probability, 


P[(R, NRg) or (B1NBs)] = P[(Ri Ra) U(B1 NBs) 
=P[(Ri Re) + P(Bi1NBe) a (1) 


Again the events R; and Re are independent and so also the 
events B, and Be, since drawing « ball from 1st urn does not affect 
the drawing of « ball from the 2nd urn. 
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“. P(Ri:NR,)= P(R,).P(Re), P(BLNB,) = P(B,).P(Bs) 
Hence from (1) the required probability = P(R,).P(Ba) + P(B,).P(B) 
=$r0t $25 
=33 
2nd part: Proceeding in the same way as above, the required 
probability in this case is : 
PUR: Bs) or (By Re) =Pi(Ri NBs) U(By NRg)) 
=P(R,NBg)+(Bi NR) 
=P(Ri).P(Ba)+ P(B1).P(Ba) 
=$r0t $s 
= $0. 


EXAMPLE 


Two persons X and Y appear in an interview for two vacancies 
in the same post. The probability of X’s selection is rr and that of 
Y’s selection is $+. What is the probability that (i) both of them will 
be selected, (ii) only one of them will be selected, and (iii) none of them 
will be selected. 


Sonurion : 


Let A denote the event that X is selected and B denote the 
event that Y is selected, Then clearly the events A and B are 
independent. 

(i) Probability that both of them will be selected 

=P(A and B) 
= P(ANB) 
=P(A),P(B) 
<r — ay. 
(ii) Probability that one will be selected 


=P((X will be selected and Y will not be selected) or (X will not 
be selected and Y will be selected)] 


=P[(ANB)U(A NB) 

=X(ANB)+P(ANB), (since A OB and ANB are mutually 
exclusive) 

= P(A).P(B) + P(A).P(B). 


= P(A).[1 - P(B)] + [1 — P(A). PCB) 


et tsht 


=#, 
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(iii) Probability that none will be selected 
= P(ANB)=P(A).P(B) = [1 — P(A). —P(B)] 
=(1-4).1-7) =o. = 4- 


EXERCISE 14 


1. An urn contains 8 white and 3 red balls. If two balls are 
drawn at random, find the probability that (i) both are white, (ii) both 


are red, (iii) one is of each colour. (0. U. 1978) [eta #) 
2. What is the chance of picking a spade or an ace not of spade 
from a pack of 52 cards ? b (G0. U. 1965) [x's] 


8. Four balls are drawn at random from a bag containing 

5 white and 7 black balls. Find the probability of getting (i) 4 white 
balls, (ii) 2 white and 2 black balls, (iii) 3 black and 1 white ball. 

Loy, 34, 86] 


4. There are 3 Geologists, 4 Hngineers, 2 Statisticians and 
1 Doctor. A committee of 4 from among them is to be formed. Find 
the probability that the committee—(i) consists of one of each kind, 
(ii) has at least one Geologist, (iii) has the Doctor as a member and 
three others. (xs, 8.2] 


5. ‘Two dice are thrown. Find the probability that (i) the first 
die shows 4, (ii) the total of the numbers on the dice is 9 or greater 
than 9, (iii) the number on the first die is greater than the number of 
the second die. (3 as, 

6. An urn contains 3 red, 4 white and 5 black balls. Three 
balls are drawn at random from the urn. Find the probabilit 
that (i) all are black, (ii) all are of different colours. Eds. & 


7. The odds against X solving a problem are 8 to 6and the 
odds in favour of Y solving the same problem are 14 to 10, What is 
the probability that if they both try, the problem will be solved at least 
by one of them ? [4$] 

8. An urn A contains 2 white and 4 black balls, Another urn 
B contains 5 white and 7 black balls. A ball is transferred from urn 
A to the urn B. Then a ball is drawn from urn B, ia {is 
probability that it will be white. 


9. A, Band O, in that order, tc 
a head wins. What are their respective 
that the game may continue indefinitely. 


coin. The first one to throw 
foe Qlanedl of winning. eh 
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10. The probability that a candidate passes in Business Mathe- 
matics is 060 and that he passes in Business Statistics is 0°50. What 
is the probability that the candidate passes only in any one of the two 
subjects, [4] 


11, The incidence of occupational disease is such that on the 
average 20% of workers suffer from it. If 10 workers are selected at 
random, find the probability that (i) exactly 2 workers suffer from the 
disease, (ii) not more than two workers suffer from the disease. 


(C. U. 1974 ) [ 0'802, 0°678 ] 


12, Two dice are thrown m times in succession. What is the 
probability of obtaining double-six at least once 2 


(C.U. 1975) [1-(88)"] 


18. A problem in Statistics is given to three students A, B, CG 
whose chances of solving it are 4, 4, % respectively. What is the 
probability that the problem will be solved ? {$] 


14, There are two identical boxes containing respectively 
4 white and 3 red balls, and 3 white and 7 red balls. A box is chosen at 
random and ball is drawn from it. Find the probability that the 
ball is white. (0.021976 ) [xo] 


15. A person is known to hit the target 3 out of 4 shots, whereas 
another person is known to hit 2 out of 3 shots. Find the probability 
that the target being hit when they both try. Lit] 


16, The probability that a student passes in a Physics test is + 
and the, probability that he passes both a Physics test and an English 
test is 1g. The probability that he passes at least one testis $. 
What is the probability that he passes the English test ? [#] 


(17. A card is drawn at random froma pack of 52 cards. If ace 
counts one, king, queen and jack count 10 each and others count at 
fia face value, show that the expectation of the value of the card 

Sige 


18. In a group of equal number of men and women, 10% men and 
45% women are unemployed. What is the probability that a person 
selected at random is unemployed ? [23] 


19. A purse contains 3 silver coins and 4 copper coins and a 
Second purse contains 4 silver and 3 copper coins. If a coin ia selected 
at random from one of the two purses, what is the probability that it 
is # silver coin ? (C. U. 1968) [4] 

20. The probability that an entering college student will be a 
graduate is 0'4. Determine the probability that out of 5 entering 
Students, (i) none, (ii) one, (iii) at least one, will be a Braduate, 


(0. U. 1964) [°07776, 2592, ‘92994 ] 


— 
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91, In a given business venture a man can make a profit of 

Rs. 300 with probability 0°6 or incur a loss of Rs. 100 with probability 
0'4. Calculate his expectation. (0. U.1966) [Bs. 140] 
92. A can solve 75% of the problems in this book and B can 
solve 70%. What is the probability that either A or B can solve & 
problem chosen at random ? [23] 
93. A packet of 10 electronic components is known to include 

3 defectives. If 4 components are randomly chosen and tested, what 
is the probability of finding among them not more than one defective ? 
(0. U. 1980) [0°6517] 


24. Two dice are thrown, find the expected value for the sum of 
their face numbers. {7] 


95. Find the expected value of the product of points on two dice. 
Werf) 


UNIVERSITY QUESTIONS 


CALCUTTA UNIVERSITY 


BUSINESS STATISTICS [ Honours ] 
1980 


Group A 


1. (a) Draw up blank table to show the number of employees 
in a large commercial firm, classified according to (¢) Sex: male and 
female ; (¢) Three age-groups: below 30, 80 and above but below 
45, 45 and above; and (di) Four income-groups: below Rs. 400, 
Rs. 400 — 750, Re. 750 — 1,000, above Rs. 1,000. 


(6) The following data show the estimated savings of the 
household sector in India during 1962-68, as revealed by the 0.9.0. 


Form of Savings Amount (Rs. crores) 
Currency Nae te 478 
Provident Fund oo se 145 
Physical os a 168 
Others ae Stag) 


Present the information in a suitable diagram so as to enable 
ne arta among the yarious components and also in relation to 
the total. 


9. (a) What are ‘auartiles’ of a distribution ? How do you 
use them for measuring dispersion ? 


(6) Calculate the mean and the median from the following 


data ; 
Weekly Wages (Rs.) Number of Workers 
Below 10 ass 8 
” 20 eee 18 
” 30 vee 45 
» 40 one 90 
» 60 te 118 


» 60 tt 120 


iia 
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3, (a) Let a1, 2,..., an be the values of a variable with 
frequencies /;, fe, ..., fn respectively. Prove that 


3 fe (m-3)=0. 
é=1 


(6) For a group of 50 boys the mean score and the standard 
deviation of scores on a test are 59°5 and 8°38 respectively. For a 
group of 40 girls the same results are 54'0 and 8'28 respectively. 
Find the mean and the standard deviation of the combined group of 
90 children 


4. (a) Define correlation coefficient and state its important 
properties (Clearly explain all the symbols you use). 
(b) A sample of size n= 16 yield the following sums : 
DS #=749, Sy=7790, Dy* =454'81, } cy =3156'80 and 5 2? =49°177. 
Compute the linear regression equation of a on ¥. 


5, (a) Discuss the merits and limitations of the moving average 
method for determining the trend in the analysis of time series, 


(b) The following series of observations is known to have a 
business cycle with a period of 4 years. Find the trend values by 
the moving average method. 


Year 1970 1971 1972 1973 1974 1975 1976 1977 1978 

Production 

(000 tons) 506 620 1086 673 588 696 1116 738 668 
1979 1980 
773 1189 


6. (a) Discuss the different steps that have to be taken in the 
construction of a price index number. 


(8) The values of a function f(z) are given below for some 
specified values of x: 
© 3 4 5 9 
F(a) 6 5 -2 30 


Using an appropriate interpolation formula, find the value of f(T). 


Group B 
7. (a) Define the terms: Null set, Disjoint sets, Finite and 
infinite sets, Complement of a set. Give one example for each. 
(b) Prove that 
AU(BN O)=(AU B)N (AUC). 
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8. (a) Define and illustrate the following terms : 
Mutually exclusive events, Exhaustive set of events, 
Independent evenis. 


(b) Boxes I and II contain respectively 4 white, 3 red and 
3 blue balls ; and 5 white, 4 red and 3 blue balls. If one ball is drawn 
at random from each box, what is the probability that both the balls 
are of the same colour ? 


9, (a) A and B are two events, not mutually exclusive, 
connected with a random experiment H. If P(A)=1/4, P(B)=2/5 
and P(A U B)=1/2, find the values of the following probabilities :— 


(i) P(AN B), (ii) P(AN BY), (iii) P(4° U B’) 
where c stands for the complement. 


(b) A bag contains 7 red balls and 5 white balls, 4 balls 
are drawn at random. What is the probability that (é) all of them 
are red ; (iz) two of them would be red and two white ? 


10. (a) How do you distinguish between ‘discrete’ and ‘conti- 
nuous’ random variables? (Illustrate your answer with suitable 
examples. 


(6) A random variable Has the following probability 
distribution : 
© 4 5 6 8 
Probability 0O1 0°3 04 0°2 


Find the expectation and the standard deviation of the random 
variable. 


11. (a) State and prove the Addition Theorem of probability for 
two mutually exclusive events. 


P (0) A packet of 10 electronic components is known to include 
3 defectives. : if 4 components are randomly chosen and tested, what 
is the probability of finding among them more than one defective ? 


12. Write notes on (any three) : 
(a) Universal set and subset, 
(8) Classical concept of probability, 
(c) Random variable, 
(d) Dependent and Independent events. 
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1981 


1. (a) Discuss briefly different types of diagrams generally used 
to represent numerical data and point out the relative merits and 
demerits of diagrammatic representation compared to other methods 
used for the purpose. 


(6) Draw the histogram of the following frequency distri- 
‘bution and use it to find the total number of wage-earners in the 
age group 19-32 years: 


16-17 21-24 80-84 

Age-group 14-15 - 18-20 25-29 35-39 
No. of wage-earners 60 140 150 110 110 100 90 

2, (a) Indicate merits and shortcomings of the different cetral 
tendency as well as situations where to use each. 

(b) In the following frequency distribution, two class-fre- 

quencies are missing : : 
Intelligence Quotient No. of Intelligence Quotient No. of 


students students 
55— 64 2 105—114 2 
65— 74 19 f 116—124 92 
75— 84 78 125—134 14 
85— 94 rs67 0 . 185—144. -- 4 
95—104 301 


It is however known that the total frequency is 900 and the 
Median 100'048. Find the two missing frequencies. 


8, (a) Prove that the sum of squares of deviations is minimum 
when deviations are measured from the mean. 


(b) From the following table calculate the values of (i) Mean, 
(ii) Standard deviation and (#2) Coefficient of variation : 


Monthly wages No. of servants Monthly wages No. of servants 


0—10 1 ' 60-60 35 
10—20 4 60—70 10 
20—30 10 70—80 7 


30—40 23 80—90 ‘i 
~40—50 Boo" in athe i 
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4. (a) Show that product-moment correlation coefficient ‘r” 
lies between +1 and -1, 


(0) The following data give the height in inches (#) and 
the weight in 1d (y) of 10 students of age 17 years: 
e 61 66 68 64 65 70 63 62 64 67 
y 112 1238 180 115 110 195 100 118 116 126 
Calculate the correlation coefficient. 


5. (a) Bxplain clearly what is meant by Time series analysis. 
Describe briefly the various characteristic movements of a time 
series. 


(b) Construct Indices of seasonal variation from the 
following time series data on consumption of cold drinks (in 
1,000 bottles) : 


Quarter 


Year 


1971 87. |-..70 
1972 78 | 15 
1973 16 72 


1974 


apy. @ “cag ladies mage index numbers io 1972 with 
; as base year from the fo lowing data by usi: i) Laspeyre’s, 
(vi) Paache’s and (ii) Fisher's method. ai sth 


1961 1972 
Commodity Unit Quantity Price (Rs.) Quantity Price (Rs.) 
A Kg 6 2°00 di 4°60 
B Quintal 7 2°60 10 3°20 
6] Dozen 6 8°00 6 4°60. 
D Kg 2 1'00 9 1°80 


(0) The following table gives values of an unknown function 
f(z) for certain equidistant values of z. Use a suitable interpolation 
formula to find the yalue of the function at ¢=24 : : 


@ 18 92 26 30 84 
fla) 1178514991 16958 19954 —_ag'ga9 
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7. (a) Let A and B be any two sets, prove that 
(4,U BY =4'N B’ 
Illustrate the theorem by Venn diagram. 
(6) Let A={a, , cl, B={a, b}, C=f{a, b, dt, D={e, d} 
and H={d} 
State which of the following statements are correct and 
give reasons 
(i) BCA (i) DDE (iii) DCB (iv) fat CA 
8. (a) Let A and B be two independent events. Then show 
that (i) A and B’, (ii) A° and B’ are also independent. 


(6) There are three men aged 60, 65 and 70 years. The 
probability to live 5 years more is 0°8 for a 60 year old, 0'6 for a 
65 year old and 0°3 for a 70 year old person. Find the probability 
that at least two of the three persons will remain alive 5 years 
hence. 


9. (a) What is meant by compound event in probability ? 
State and prove the theorem of compound probability, 


(b) A number is chosen at random from the set 1, 2, 3, 
..., 100 and another number is chosen at random from the set 
1, 2,..., 50. What is the expected value of the product ? 


10. Write notes on (any three) : 
(a) Frequency interpretation of probability. 
(b) Union, intersection and difference of two events. 
(c) Random variable. 
(d) Ordered pair and Cartesian product. 
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27 31 35 
26 29 33 
25 28 32 

2427 31 
23 26 29 
22 25 29) 
5} 22 25 28 
20 23 26 


2430 | 2455 | 2480) 
26722605 | 2718 |2742|2765 
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PARAS SARS S PUNY BVUMNAG AKAN VS 
7 me UNUM HAWAH HAW I Ce 
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OO OM? ArsssTs 
Sasa we HOOD HO 
2000 1D 1 DOO 


8055 
8116/8122 
8182 
8248 
8312 
8375 
8439 


PEN POBHW BOWOW BOWOW W 
BOW -WWWWW HWW AASAL 
ADA SRAAR PALES PAE 


TRUE PADAA PARAA SHRERR PRN HHUA MOUNUMH AD 


06) 9112] 9117 
9159} 91659170 | 9175] 9180 


BREAD BRADS PHASE HUGH YUNA WADA AAAAA AO 
SRPUG ANU ANU BUNGE A ARAAA DABDAA WIS IN 


{9800 | 9805 | 9899| 981419 
4! | 9845 |9850 | 9854 [0850 


SAGA PRSDAS SAO VWUWYW W 
Min MPS SSS PSDP A FSH W 


3069 | 3076 | 3083 
314113148] 3155 
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